East 6 User Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 2767
Download | |
Open PDF In Browser | View PDF |
<<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 i <<< Contents * Index >>> Preface Acknowledgements Welcome to East, a software platform for the statistical design, simulation and monitoring of clinical trials. The current release of East (version 6.4) was developed by a team comprising (in alphabetical order): Gordhan Bagri, Dhaval Bapat, Priyanka Bhosle, Jim Bolognese, Sudipta Basu, Jaydeep Bhattacharyya, Swechhya Bista, Apurva Bodas, Pushkar Borkar, V. P. Chandran, Soorma Das, Pratiksha Deoghare, Aniruddha Deshmukh, Namita Deshmukh, Yogesh Dhanwate, Suraj Ghadge, Pranab Ghosh, Karen Han, Aarati Hasabnis, Pravin Holkar, Munshi Imran Hossain, Abhijit Jadhav, Yogesh Jadhav, Prachi Jagtap, Paridhi Jain, Yannis Jemiai, Ashwini Joshi, Nilesh Kakade, Janhavi Kale, Aditya Kamble, Anthiyur Kannappan, Parikshit Katikar, Uday Khadilkar, Kapildev Koli, Yogita Kotkar, Hrishikesh Kulkarni, Mandar Kulkarni, Mangesh Kulkarni, Shailesh Kulthe, Charles Liu, Lingyun Liu, Shashank Maratkar, Cyrus Mehta, Pradoshkumar Mohanta, Manashree More, Tejal Motkar, Ankur Mukherjee, Nabeela Muzammil, Neelam Nakadi, Vijay Nerkar, Sandhya Paranjape, Gaurangi Patil, Vidyadhar Phadke, Anup Pillai, Shital Pokharkar, Vidyagouri Prayag, Achala Sabane, Sharad Sapre, Rohan Sathe, Pralay Senchaudhuri, Rhiannon Sheaparé, Pradnya Shinde, Priyadarshan Shinde, Sumit Singh, Sheetal Solanki, Chitra Tirodkar, Janhavi Vaidya, Shruti Verma, Pantelis Vlachos, Suchita Wageshwari, Kiran Wadje, Ritika Yadav. Others contributors to this release include Asmita Ghatnekar, Sam Hsiao, Brent Rine, Ajay Sathe, Chinny Swamy, Nitin Patel, Yogesh Gajjar, Shilpa Desai. Other contributors who worked on previous releases of East: Gayatri Bartake, Ujwala Bamishte, Apurva Bhingare, Bristi Bose, Chandrashekhar Budhwant, Krisnaiah Byagari, Vibhavari Deo, Rupali Desai, Namrata Deshpande, Yogesh Deshpande, Monika Ghatage, Ketan Godse, Vishal Gujar, Shashikiran Halvagal, Niranjan Kshirsagar, Kaushal Kulkarni, Nilesh Lanke, Manisha Lohokare, Jaydip Mukhopadhyay, Abdulla Mulla, Seema Nair, Atul Paranjape, Rashmi Pardeshi, Sanket Patekar, Nabarun Saha, Makarand Salvi, Abhijit Shelar, Amrut Vaze, Suryakant Walunj, Sanhita Yeolekar. We thank all our beta testers for their input and obvious enthusiasm for the East software. They are acknowledged by name in Appendix Z. We owe a debt of gratitude to Marvin Zelen and to Swami Sarvagatananda, special ii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 people whose wisdom, encouragement and generosity have inspired Cytel for over two decades. Finally, we dedicate this software package to our families and to the memory of our dearly departed Stephen Lagakos and Aneesh Patel. Our Philosophy We would like to share with you what drives and inspires us during the research and development stages of the East software. Empower, do not Frustrate We believe in making simple, easy-to-use software that empowers people. We believe that statisticians have a strategic role to play within their organization and that by using professionally developed trial design software they will utilize their time better than if they write their own computer programs in SAS or R to create and explore complex trial designs. With the help of such software they can rapidly generate many alternative design options that accurately address the questions at hand and the goals of the project team, freeing time for strategic discussions about the choice of endpoints, population, and treatment regimens. We believe that software should not frustrate the user’s attempt to answer a question. The user experience ought to engage the statistician and inspire exploration, innovation, and the quest for the best design. To that end, we believe in the following set of principles: Fewer, but Important and Useful Features It is better to implement fewer, but important and useful features, in an elegant and simple-to-use manner, than to provide a host of options that confuse more than they clarify. As Steve Jobs put it: ’Innovation is not about saying ”Yes” to everything. It’s about saying ”No” to all but the most crucial features.’ Just because we Can, doesn’t mean we Should Just because we can provide functionality in the software, doesn’t mean we should. Simplify, Simplify, Simplify Find and offer simple solutions - even for the most complex trial design problems. Don’t Hurry, but continually Improve Release new solutions when they are ready to use and continually improve the commercial releases with new features, bug fixes, and better documentation. Provide the best Documentation and Support Our manuals are written like textbooks, to educate, clarify, and elevate the statistical knowledge of the user. Preface iii <<< Contents * Index >>> Preface Our support is provided by highly competent statisticians and software engineers, focusing on resolving the customer’s issue, and being mindful of the speed and quality requirements. We believe that delivering delightful customer support is essential to our company’s lifeblood. Finally, we listen to our customers constantly and proactively through countless informal and formal interactions, software trainings, and user group meetings. This allows us to follow all the principles laid out above in the most effective manner. Assess It is essential to be able to assess the benefits and flaws of various design options and to work one’s way through a sensitivity analysis to evaluate the robustness of design choices. East can very flexibly generate multiple fixed sample size, group sequential, and other adaptive designs at a click of a button. The wealth of design data generated in this manner requires new tools to preview, sort, and filter through in order to make informed decisions. Share Devising the most innovative and clever designs is of no use if the statistician is unable to communicate in a clear and convincing manner what the advantages and characteristics of the design are for the clinical trial at hand. We believe statistical design software tools should also be communication tools to share the merits of various trial design options with the project team and encourage dialog in the process. The many graphs, tables, simulation output, and other flexible reporting capabilities of East have been carefully thought out to provide clear and concise communication of trial design options in real time with the project team. Trust East has been fully validated and intensely tested. In addition, the East software package has been in use and relied upon for almost 20 years. East has helped design and support countless actual studies at all the major pharmaceutical and biotech companies, academic research centers, and government institutions. We use and rely on our software every day in our consulting activities to collaborate with our customers, helping them optimize and defend their clinical trial designs. This also helps us quickly identify things that are frustrating or unclear, and improve them fast - for our own sake and that of our customers. iv Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 What’s New in East 6.4 Version 6.4 of East introduces some important new features: 1. Multi-arm multi-stage designs East now offers the ability to design multi-arm multi-stage studies with options for early stopping, dose selection, and sample size re-estimation. The group sequential procedures (Gao et al., 2014) have been implemented for normal endpoint whereas the p-value combination approaches (Posch et al. 2005) have been implemented for both normal and binomial endpoints. See Chapters 17, 18 and 29 for more details. 2. Multiple endpoints designs for binomial endpoints Gatekeeping procedures to control family-wise type-1 error when testing multiple families of binomially distributed endpoints are now available in East for fixed sample (1-look) designs. East will also use the intersection-union test when testing a single family of endpoints. See Chapter 16 and 28 for more details. 3. Multi-arm designs for survival endpoints Designs for pairwise comparisons of treatment arms to control have been added for survival endpoints. See Chapter 51 for more details. 4. Enrollment and event prediction East now includes options to predict enrollment and events based on accumulating blinded data and summary statistics. Prediction based on unblinded data was already implemented in the previous version so the current version provides both options - Unblinded as well as Blinded. See Chapter 68 for more details. 5. Dual agent dose-escalation designs This version of East adds methods to the Escalate module for dual-agent dose-escalation designs, including the Bayesian logistic regression model (BLRM; Neuenschwander et al., 2014), and the Product of Independent beta Probabilities dose Escalation (PIPE; Mander et al., 2015). Numerous feature enhancements have also been made to the existing single-agent dose escalation designs. See Chapter 32 for more details. 6. Bayesian probability of success (assurance) and predictive power for survival designs East 6.4 will now calculate assurance (O’Hagan et al., 2005), or Bayesian probability of success, and predictive power for survival endpoints. See Chapter 48 for more details. 7. Interim monitoring using Muller and Schafer method East6.4 will now provide the capability of monitoring clinical trials using the adaptive approach. It can be done using the Muller and Schafer method. Currently, this feature is Preface v <<< Contents * Index >>> Preface available for Survival Endpoint tests only. See Chapter 56 for more details. 8. General usability enhancements Numerous enhancements have been made to the software to improve the user experience and workflow. What’s New in East 6.3 Version 6.3 of East introduces some important new features: 1. Updates to Promising Zone designs: Ratio of Proportions designs; Müller and Schäfer type-1 error control method; Estimation East 6.3 introduces Promising Zone designs for the ratio of proportions. East 6.3 also implements the method of Müller and Schäfer (2001) to control type-1 error for adaptive unblinded sample size re-estimation designs. This is available for simulation and interim monitoring. Also estimation using Repeated Confidence Intervals (RCI) and Backward Image Confidence Intervals (BWCI) (Gao, Liu & Mehta, 2013) are available in Müller and Schäfer simulations. See Chapter 52 for more details. 2. Multiple endpoint designs Parallel gatekeeping procedures to control family-wise type-1 error when testing multiple families of normally distributed endpoints are now available in East for fixed sample (1-look) designs. East will also use the intersection-union test when testing a single family of endpoints. See Chapter 16 for more details. 3. Exact designs for binomial endpoints East now includes the ability to use the exact distribution when computing power and samples size for binomial endpoints. This applies for all binomial tests in the case of fixed designs. In addition, group sequential exact designs are available for the single proportion case, and the Simon’s two-stage optimal and minimax designs (Simon, 1989) have been implemented that allow for early futility stopping while optimizing the expected sample size and the maximum sample size, respectively. See Chapter 33 for more details. 4. Dose escalation designs East 6.3 now includes a module for the design, simulation, and monitoring of modern dose-escalation clinical trials. Model-based dose-escalation methods in this module include the Continual Reassessment Method (mCRM; Goodman et al., 1995), the Bayesian logistic regression model (BLRM; Neuenschwander et al., 2008), and the modified Toxicity Probability Interval (mTPI; Ji et al., 2010). See Chapter 32 for more details. 5. Predictive interval plots, conditional simulations, , and enrolment/events vi Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 prediction East 6.3 now includes a module that offers the ability to simulate and forecast the future course of the trial based on current data. This includes conditional simulations to assess expected treatment effects and associated repeated confidence intervals at future looks (also called Predicted Interval Plots or PIP; Li et al. 2009), as well as the probability of finishing with a successful trial (conditional power). You can also plan and simulate clinical trials with greater precision using different accrual patterns and response information for different regions/sites. East allows you to make probabilistic statements about accruals, events, and study duration using Bayesian models and accumulating data. See Chapters 65, 66 and 67 for more details. 6. Sample size and information calculators Sample size and information calculators have been added back into East to allow easy calculation of the two quantities. See Chapter 59 for more details. 7. Exporting/Importing between East and East Procs East 6.3 designs can now be exported to work with the newly released East Procs. The output from East Procs can be imported back into East 6.3 for use in the East Interim Monitoring dashboard and to conduct conditional inference and simulations. See Chapters 69 for more details. 8. Changes to East input Many changes have been implemented in East to enhance the user experience in providing input for their designs. These changes include the ability to specify multiple values of input parameters for survival designs (most notably the Hazard Ratio), the ability to directly convert many fixed sample designs into group sequential designs with the use of the Sample Size based design option, and the ability to convert an ANOVA design into a Multiple Comparison to Control design. 9. Changes to East output Display of East output has been changed in many ways, including color coding of input and output, ability to collapse and expand individual tables, greater decimal display control, and more exporting options for results (e.g. ability to export graphs directly into Microsoft Power Point). What’s New in East 6.2 Version 6.2 of East introduces some important new features: 1. Promising Zone Designs using CHW and CDL type-1 error control methods Preface vii <<< Contents * Index >>> Preface East 6.2 introduces Promising Zone Designs from East 5.4 for differences of means, proportions, and the log-rank test. The methods of Cui, Hung, and Wang (1999) and Chen, DeMets, and Lan (2003) are implemented for adaptive unblinded sample size re-estimation designs and available for simulation and interim monitoring. 2. Multiple endpoint designs Serial gatekeeping procedures to control family-wise type-1 error when testing multiple families of normally-distributed endpoints are now available in East for fixed sample (1-look) designs. 3. Power and sample size calculations for count data East now offers power analysis and sample size calculations for count data in fixed sample (1-look) designs. Specifically, East provides design capabilities for: (a) Test of a single Poisson rate (b) Test for a ratio of Poisson rates (c) Test for a ratio of Negative Binomial rates 4. Precision-based sample size calculations Sample size calculations are now available based on specification of a confidence interval for most tests provided in East. What’s New in East 6.1 Version 6.1 of East introduces some important new features: 1. Bayesian probability of success (assurance) and predictive power For one-sample and two-sample continuous and binomial endpoints, East 6.1 will now compute Assurance (O’Hagan et al., 2005) or Bayesian probability of success, a Bayesian version of power, which integrates power over a prior distribution of the treatment effect, giving an unconditional probability that the trial will yield a significant result. When monitoring such a design using the Interim Monitoring dashboard, East 6.1 will also compute Bayesian predictive power using the pre-specified prior distribution on the treatment effect. This computation will be displayed in addition to the fiducial version of predictive power, which uses the estimated treatment effect and standard error to define a Gaussian prior distribution. 2. Stratification in simulation of survival endpoints When simulating a trial design with a time-to-event endpoint, East 6.1 accommodates data generation in a stratified manner, accounting for up to 3 stratification variables and up to 25 individual strata. The fraction of subject data generated in each stratum, and the survival response generation mechanism for each stratum, can be flexibly adjusted. In addition, stratified versions of the logrank statistic and other test statistics available for analysis of the simulated data are provided. viii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. Integration of R code into simulations East 6.1 simulations now include the option to use custom R code to define specific elements of the simulation runs. R code can be used to modify the way the subjects are accrued, how they are randomized, how their response data are generated, and how the test statistic is computed. 4. Reading East 5.4 workbooks East 5.4 workbooks can be read into East 6.1 after conversion using the utility provided in the program menu. Go to the start menu and select: Programs > East Architect > File Conversion> East5 to East6 5. Floating point display of sample size East 6.1 now has a setting to choose whether to round sample sizes (at interim and final looks) up to the nearest integer, or whether to display them as a floating point number, as in East 5. (See 6. Enhancement to the Events vs. Time plot This useful graphic for survival designs has been updated to allow the user to edit study parameters and create a new plot directly from a previous one, providing the benefit of quickly assessing the overall impact of input values on a design prior to simulation. (See 7. Interim monitoring (IM) dashboard The capability to save snapshots of the interim monitoring (IM) dashboard is now supported in East 6.1. At each interim look of a trial, updated information can be saved and previous looks can be easily revisited. Alternatively, prior to employing actual data this functionality could be used to compare multiple possible scenarios, providing the user a sense of how a future trial could unfold. 8. Enhancement to the Logrank test For trials with survival endpoints, East 6.1 allows the user to simultaneously create multiple designs by specifying a range of values for key parameters in t Logrank test. (See Subsection 9. Enhancement to binomial designs For studies with discrete outcomes, East 6.1 allows the user to simultaneously create multiple designs by specifying a range of values for key parameters. What’s New in East 6.0 on the Architect Platform East Architect is version 6.0 of the East package and builds upon earlier versions of the software. The transition of East to the next generation platform that is Architect has abandoned all prior dependencies of Microsoft Excel. As a result the user interface is very different leading to a new user experience and workflow. Although you might find that there is a learning curve to getting comfortable with the software, we trust that you will find that the new platform provides for a superior user experience and improved workflow. Preface ix <<< Contents * Index >>> Preface The Architect platform also adds data management and analysis capabilities similar to those found in Cytel Studio, StatXact, and LogXact, as well as a powerful reporting tool we call Canvas, which provides flexible and customizable reports based on design and simulation information. Version 6.0 of East introduces some important new features in addition to the new platform environment. Here is a selection: 1. New designs A large number of fixed sample designs have been added for various endpoints and trial types. These were present in the SiZ software and have now been fully integrated into East. 2. Multi-arm designs Designs for pairwise comparisons of treatment arms to control have been added for differences of means and differences of proportions. These designs are mostly simulation-based and provide operating characteristics for fixed sample studies using multiplicity adjusting procedures such as Dunnett’s, Bonferroni, Sidak, Hochberg, Fallback, and others. 3. Creation of multiple designs or simulations at once: East Architect provides the ability to create multiple designs or to run multiple simulation scenarios at once, by specifying lists or sequences of values for specific parameters rather than single scalars. This capability allows the user to explore a greater space of possibilities or to easily perform sensitivity analysis. Accompanying tools to preview, sort, and filter are provided to easily parse the large output generated by East. 4. Response lag, accrual, and dropouts for continuous and discrete endpoints: Designs created for continuous and discrete endpoints now have the option for the user to specify a response lag (between randomization and observation of the endpoint), as well as an accrual rate and dropout rate for the study population. As a result, some terminology has been introduced to distinguish between the number of subjects who need to be enrolled in the study (Sample Size) and the number of subjects whose endpoint must be observed in order to properly power the study (Completers). 5. Flexibility in setting up boundaries Both the efficacy and futility rules of a design need not be present at each and every look anymore. The user can specify whether a look includes either the efficacy stopping rule or the futility rule or both. Therefore, a design can be set up where at the first look only futility stopping is possible, whereas at later looks both efficacy and futility or maybe only efficacy stopping is allowed. In addition, the futility rule can now be specified on two new scales, which are the standardized treatment scale and the conditional power scale. 6. Predictive power Predictive power is now provided as an alternative to conditional power in the interim monitoring sheet of the software. Further x Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details about how this is implemented can be found in the appendix C. 7. Comparing designs One can compare multiple designs either graphically or in tabular format simply by selecting them and choosing a plot or table output button. 8. Improvements in algorithms Many improvements have been made to the way computations are performed, both to improve accuracy and speed, but also to provide more intuitive results. For example, older versions of East used an approximation to conditional power based on ignoring all future looks but the final one. This approximation has been dropped in favor of computing the exact value of conditional power. Many other changes have been made that might result in different values being computed and displayed in East Architect as compared to earlier versions of the software. For greater details about the changes made, please refer to the ”Read Me” notes that accompany the software release. What’s New in East 5 After East 5 (version 5.0) was released, a few upgrades have been issued. The details are: 1. 2. 3. 4. In the current release of version 5.4, the module EastSurvAdapt has been added. In the previous version 5.3, the module EastAdapt was substantially revised. In the earlier version 5.2, the module EastExact was released. In the still earlier version 5.1, several improvements were introduced in EastSurv module. The details of these modules can be found in the respective chapters of the user manual. East 5 upgraded the East system in several important ways in direct response to customer feedback. Six important extensions had been developed in East 5: 1. Designs using t-tests: In previous versions of East, the single look design was treated as a special case of a group sequential design. Thus the same large sample theory was used to power and size these traditional types of designs. Recognizing this solution not to be entirely satisfactory for small sample trials, in East 5, we have implemented single-look t-test designs for continuous data. (Sections 8.1.4, 8.2.4, 9.1.3, and 11.1.3) 2. New boundaries: East 5 provides two new procedures for specifying group sequential boundaries. Generalized Haybittle-Peto boundaries allow the user to specify unequal p-values at each interim look for a group sequential plan. East will Preface xi <<< Contents * Index >>> Preface recalculate the final p-value in order to preserve the type I error. (Section 38.1) The cells for entering the cumulative alpha values of an interpolated spending function can be automatically populated with the cumulative alpha values of any of the published spending functions available to East, and subsequently edited to suit user requirements. For example, a 4-look Lan and DeMets O’Brien-Fleming spending function can be modified so that the critical value at the first look is less conservative than usual. (Section 38.3.1) 3. Interim monitoring and simulation for single-look designs: Interim monitoring and simulation sheets have been provided for all single look designs in East 5. 4. Improvement to Charts: Many improvements to existing charts in East have been implemented in this version. Scaling in the Duration vs. Accrual chart has been corrected to provide a better tool for the user. The use of semi-log scaling has enabled us to represent many charts on the natural scale of the treatment effect. This concerns mostly any ratio and odds ratio metrics such as the relative risk, the hazard ratio, and the odds ratio. Boundaries on the relative risk scale for example are now available in East 5. Boundaries can also be visualized on the score scale. Charts can be summarized in tabular form. Option is given to the user to generate tables of power vs. sample size, power vs. treatment effect, events vs. time, and so on. These tables can easily be copied and pasted into external applications like Microsoft Word and Excel (Section 4.5) 5. Improved usability: Much attention in East 5 was spent to improve the user’s experience within the environment. A graph sheet allows the user to compare up to 16 charts side by side. Charts for any number of plans within a workbook can be exported to the graph sheet. (Section 5.3) The scratch sheet is a full-fledged Microsoft Excel sheet that can be brought up within the East application . (Section 4.4) The split view option enables the user to see two sheets of the same workbook simultaneously. This can be useful if one window pane contains a scratch sheet where side calculations may be done based on numbers in xii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the other window pane. Another use can be to have two or plans to show up on one pane and their graphsheet containing boundaries or other charts to show up on another pane for easy comparison. (Section 4.8) Messages in the help menu, pop-up help, and context sensitive help have been revised and rendered more informative to the user. The default appearance of charts can be specified by the user through the preferences settings menu item. (Section 4.7) 6. Installation validation: East 5 includes an installation validation procedure that will easily check that the software has been properly installed on the user’s system. (Section 2.3) Finally, there has been an important reorganization of the East manual, which now comprises seven volumes organized as follows: (1) The East System (2) Continuous Endpoints (3) Binomial and Categorical Endpoints (4) Time-to-Event Endpoints (5) Adaptive Designs (6) Special Topics (7) Appendices. Page numbers are continuous through volumes 1-7. Each volume contains a full table of contents and index to the whole manual set. Preface to East 4 East 4 was a very large undertaking involving over 20 developers, documenters, testers and helpers over a two-year period. Our goal was to produce one single powerful design and monitoring tool with a simple, intuitive, point and click, menu driven user interface, that could cover the full range of designs commonly encountered in a clinical trial setting, for either fixed sample or group sequential designs. The resulting product, East 4, extends the East system for flexible design and interim monitoring in four major ways as listed below. 1. Broad Coverage: Previous versions of East dealt primarily with the design of two-arm group sequential trials to detect a difference of means for normal and binomial endpoints and a hazard ratio for survival endpoints. East 4 extends these capabilities to other settings. Easily design and monitor up to 34 different clinical trial settings including one-, two- and K-sample tests; linear, logistic and Cox regression; longitudinal designs; non-inferiority and bioequivalence designs; cross-over and matched-pair designs; nonparametric tests for continuous and ordered categorical outcomes. Comparisons between treatment and control groups can be in terms of differences, ratios or odds ratios. Preface xiii <<< Contents * Index >>> Preface Non-inferiority trials can be designed to achieve the desired power at superiority alternatives 2. New Stopping Boundaries and Confidence Intervals: Non-binding futility boundaries. Previously futility boundaries could not be overruled without inflating the type-1 error. New non-binding futilty boundaries preserve power and type-1 error and yet can be overruled if desired. Asymmetric two-sided efficacy boundaries. You can allocate the type-1 error asymmetrically between the upper and lower stopping boundaries, and can spend it at different rates with different error spending functions. This will provide added flexiblity for aggressive early stopping if the treatment is harmful and conservative early stopping if the treatment is beneficial. Futility boundaries can be represented in terms of conditional power. This brings greater objectivity to conditional power criteria for early stopping. Two sided repeated confidence intervals are now available for one-sided tests with efficacy and futility boundaries. Previously only one-sided confidence bounds were available. Interactive repeated confidence intervals are provided at the design stage to aid in sample size determination and selection of stopping boundaries. 3. New Analytical and Simulation Tools for Survial Studies: EastSurv is an optional new module, fully integrated into the East system, that extends East’s design capabilities to survival studies with non-uniform accrual, piecewise exponential distributions, drop outs, and fixed length of follow-up for each subject. Designs can be simulated under general settings including non-proportional hazard alternatives. 4. Design and Simulation of Adaptive Trials: EastAdapt is an optional new module, fully integrated into the East system, that permits data-dependent changes to sample size, spending functions, number and spacing of interim looks, study objectives, and endpoints using a variety of published flexible approaches. In addition to these substantial statistical capabilities, East 4 has added numerous improvements to the user interface including clearer labeling of tables and graphs, context sensitive help, charts of power versus sample size and power versus number of events, convenient tools for calculating the test statistics to be entered into the interim monitoring worksheet for binomial endpoints, and the ability to type arithmetic expressions into dialog boxes and into design, interim monitoring and simulation worksheets. xiv Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Preface to East 3 East 3 is a major upgrade of the East-2000 software package for design and interim monitoring of group sequential clinical trials. It has evolved over a three-year period with regular input from our East-2000 customers. The main improvements that East 3 offers relative to East-2000 are greater flexibility in study design, better tracking of interim results, and more powerful simulation capabilities. Many of our East-2000 customers expressed the desire to create group sequential designs that are ultra-conservative in terms of stopping early for efficacy, but which can be quickly terminated for futility. The extremely wide selection of spending functions and stopping boundaries in East 3, combined with its interactive Excel-based spreadsheet user interface for comparing multiple designs quickly and effortlessly, make such designs possible. The interim monitoring module of East 3 has been completely revised, with a “dashboard” user interface that can track the test statistic, error spent, conditional power, post-hoc power and repeated confidence intervals on a single worksheet, over successive interim monitoring time points, for superior trial management and decision making by a data monitoring committee. Finally, we have enhanced the simulation capabilities of East 3 so that it is now possible to evaluate the operating characteristics not only of traditional group sequential designs, but also of adaptive designs that permit mid-course alterations in the sample size based on interim estimates of variance or treatment effect. A list of the substantial new features in East 3 relative to East-2000 is given below. (The items on this list beginning with ‘(*)’ are optional extras.) New Design Features 1. Design of non-inferiority trials. 2. Design of trials with unequally spaced looks. 3. Use of Lan and DeMets (1983) error spending functions to derive stopping boundaries. 4. (*) Flexible stopping boundaries derived from the gamma spending function family (Hwang, Shih and DeCani, 1990) and the rho spending function family (Kim and DeMets, 1987). 5. Haybittle-Peto stopping boundaries (Haybittle, 1971). 6. (*) Boundaries derived from user-specified spending functions with interpolation. 7. Boundaries for early stopping for futility only. 8. Graphical and numerical representation of stopping boundaries on other scales besides the standard normal scale; e.g., boundaries expressed on the p-value scale, effect size scale, and conditional power scale. 9. Computing power for a fixed sample size. 10. Chart displaying the number of events as a function of time (for survival studies). Preface xv <<< Contents * Index >>> Preface New Interim Monitoring Features 1. Detailed worksheet for keeping track of interim monitoring data and providing input to the data monitoring committee. 2. Simultaneous view of up to four thumbnail charts on the interim monitoring worksheet. Currently one may select any four charts from, the stopping boundary chart, the error spending chart, the conditional power chart, the post-hoc power chart, and the repeated confidence intervals chart. You can also expand each thumbnail into a full-sized chart by a mouse click. 3. Computation of repeated confidence interval (Jennison and Turnbull, 2000) at each interim look. New Simulation Features 1. (*) Simulation of actual data generated from the underlying normal or binomial model instead of simulating the large sample distribution of the test statistic. 2. (*) Simulation on either the maximum sample size scale, or the maximum information scale. 3. (*) Simulation of the adaptive design due to Cui, Hung and Wang (1999). New User Interface Features 1. Full integration into the Microsoft Excel spreadsheet for easy generation and display of multiple designs, interim monitoring or simulation worksheets, and production of reports. 2. Save design details and interim monitoring results in Excel worksheets for easy electronic transmission to regulatory reviewers or to end-users. 3. Create custom calculators in Excel and save them with the East study workbook. Preface to East-2000 For completeness we repeat below the preface that we wrote for the East-2000 software when it was released in April, 2000. Background to the East-2000 Development The precursor to East-2000 was East-DOS an MS-DOS program with design and interim monitoring capabilities for normal, binomial and survival end points. When East-DOS was released in 1991 its user interface and statistical features were adequate to the needs of its customer base. MS-DOS was still the industry standard operating system for desktop computers. Group sequential designs were not as popular then as they are now. The role of data and safety monitoring boards (DSMB’s) in interim monitoring was just beginning to emerge. FDA and industry guidelines on the conduct of group sequential studies were in the early draft stage. Today the situation is very different. Since the publication of xvi Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the ICH-E9 guidance on clinical trials by the FDA and regulatory bodies in Europe and Japan, industry sponsors of phase-III clinical trials are more favorably inclined to the group sequential approach. For long-term mortality studies especially, interim monitoring by an independent DSMB is almost mandatory. As the popularity of group sequential studies has increased so has the demand for good software to design and monitor such studies. For several years now we have been flooded with requests from our old East-DOS customers to move away from the obsolete MS-DOS platform to Microsoft Windows and to expand the statistical capabilities of the software. We have responded by developing East-2000, a completely re-designed Windows package with unparalleled design, simulation and interim monitoring capabilities. What’s New in East-2000 The East-2000 software adds considerable functionality to its MS-DOS predecessor through a superior user interface and through the addition of new statistical methods. New User Interface East-2000 is developed on the Microsoft Windows platform. It supports a highly interactive user interface with ready access to stopping boundary charts, error spending function charts, power charts and the ability to present the results as reports in Microsoft Office. 1. Interactivity Designing a group sequential study is much more complex than designing a fixed sample study. The patient resources needed in a group sequential setting depend not only on the desired power and significance level, but also on how you will monitor the data. How many interim looks are you planning to take? What stopping boundary will you use at each interim look? Does the stopping boundary conform to how you’d like to spend the type-1 error at each look? Do you intend to stop early only for benefit, only for futility, or for both futility and benefit? In a survival study, how long are you prepared to follow the patients? These design and monitoring decisions have profound implications for the maximum sample size you must commit up-front to the study, the expected sample size under the null and alternative hypotheses, and the penalty you will have to pay in terms of the nominal p-value needed for declaring significance at the final look. To take full advantage of the group sequential methodology and consider the implications of potential decisions you must have highly interactive software available, both at the study design stage and at the interim monitoring stage. East-2000 is expressly developed with this interactivity in mind. Its intuitive form-fill-in graphical user interface can be an invaluable tool for visualizing how these design and monitoring decisions will affect the operating characteristics of the study. Preface xvii <<< Contents * Index >>> Preface 2. Charts By clicking the appropriate icon on the East toolbar you can view stopping boundary charts, study duration charts, error spending function charts, conditional and post-hoc power charts, and exit probability tables. The ease with which these charts can be turned on and off ensures that they will be well utilized both at the design and interim monitoring phases of the study. 3. Reports All worksheets, tables and charts produced by East-2000 can be copied and pasted into Microsoft Word, Excel and PowerPoint pages thus facilitating the creation of annotated reports describing the study design and interim monitoring schedule. New Statistical Methods East-2000 has greatly expanded the design and interim monitoring capabilities previously available in East-DOS. In addition East-2000 provides a simulation module for investigating how the power of a sequential design is affected by different assumptions about the magnitude of the treatment difference. Some highlights from these new capabilities are listed below. 1. Design Whereas East-DOS only provided design capabilities for normal, binomial and survival end points East-2000 makes it possible to design more general studies as well. This is achieved through the use of an inflation factor. The inflation factor determines the amount by which the sample size of a fixed sample study should be inflated so as to preserve its type-1 error in the presence of repeated hypothesis tests. It is thus possible to use any external software package to determine the fixed sample size of the study, input this fixed sample size into the design module of East-2000 and have the sample size inflated appropriately. These general capabilities are discussed in Chapter 8. 2. Interim Monitoring A major new feature in the interim monitoring module of East-2000 is the computation of adjusted p-values, confidence intervals and unbiased parameter estimates at the end of the sequential study. Another important feature is the ability to monitor the study on the Fisher information scale and thereby perform sample-size re-estimation if initial assumptions about the data generating process were incorrect. Chapter 9 provides an example of sample-size re-estimation for a binomial study in which the initial estimate of the response rate of the control drug was incorrect. 3. Simulation East-2000 can simulate an on-going clinical trial and keep track of the frequency with which a stopping boundary is crossed at each interim monitoring time-point. These simulations can be performed under the null hypothesis, the alternative hypothesis or any intermediate hypothesis thus permitting us to evaluate how the various early stopping probabilities are affected by miss-specifications in the magnitude of the treatment effect. xviii Preface <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Continuous Development of East East-2000 will undergo continuous development with major new releases expected on an annual basis and smaller improvements regularly posted on the Cytel web site. We will augment the software and implement new techniques based on the recommendations of the East Advisory Committee, and as the demand for them is expressed by our customers. The following items are already on the list: Easy links to fixed-sample design packages so as to extend the general methods in Chapter 8; Analytical and simulation tools to convert Fisher information into sample size and thereby facilitate the information based design and interim monitoring methods of Chapter 9, especially for sample-size re-estimation. We will build a forum for discussing East related issues on the Cytel web site, www.cytel.com. Interesting case studies, frequently asked questions, product news and other related matters will be posted regularly on this site. Roster of East Consultants Cytel offers consulting services to customers requiring assistance with study design, interim monitoring or representation on independent data and safety monitoring boards. Call us at 617-661-2011, or email sales@cytel.com, for further information on this service. Preface xix <<< Contents * Index >>> <<< Contents * Index >>> Table of Contents Preface 1 The East System ii 1 1 Introduction to Volume 1 2 2 Installing East 6 3 3 Getting Started 7 4 Data Editor 2 Continuous Endpoints 55 71 5 Introduction to Volume 2 73 6 Tutorial: Normal Endpoint 79 7 Normal Superiority One-Sample 91 8 Normal Noninferiority Paired-Sample 113 9 Normal Equivalence Paired-Sample 128 10 Normal Superiority Two-Sample 141 11 Nonparametric Superiority Two Sample 179 12 Normal Non-inferiority Two-Sample 185 13 Normal Equivalence Two-Sample 211 xxi <<< Contents * Index >>> Table of Contents xxii 14 Normal: Many Means 232 15 Multiple Comparison Procedures for Continuous Data 240 16 Multiple Endpoints-Gatekeeping Procedures 265 17 Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs 285 18 Two-Stage Multi-arm Designs using p-value combination 309 19 Normal Superiority Regression 332 3 342 Binomial and Categorical Endpoints 20 Introduction to Volume 3 344 21 Tutorial: Binomial Endpoint 350 22 Binomial Superiority One-Sample 363 23 Binomial Superiority Two-Sample 394 24 Binomial Non-Inferiority Two-Sample 474 25 Binomial Equivalence Two-Sample 535 26 Binomial Superiority n-Sample 549 27 Multiple Comparison Procedures for Discrete Data 577 28 Multiple Endpoints-Gatekeeping Procedures for Discrete Data 601 29 Two-Stage Multi-arm Designs using p-value combination 621 30 Binomial Superiority Regression 644 31 Agreement 649 32 Dose Escalation 658 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4 Exact Binomial Designs 708 33 Introduction to Volume 8 709 34 Binomial Superiority One-Sample – Exact 714 35 Binomial Superiority Two-Sample – Exact 736 36 Binomial Non-Inferiority Two-Sample – Exact 751 37 Binomial Equivalence Two-Sample – Exact 767 38 Binomial Simon’s Two-Stage Design 774 5 784 Poisson and Negative Binomial Endpoints 39 Introduction to Volume 4 785 40 Count Data One-Sample 790 41 Count Data Two-Samples 799 6 819 Time to Event Endpoints 42 Introduction to Volume 6 820 43 Tutorial: Survival Endpoint 826 44 Superiority Trials with Variable Follow-Up 865 45 Superiority Trials with Fixed Follow-Up 908 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates 934 47 Non-Inferiority Trials with Fixed Follow-Up 950 48 Superiority Trials Given Accrual Duration and Study Duration 966 49 Non Inferiority Trials Given Accrual Duration and Study Duration 984 xxiii <<< Contents * Index >>> Table of Contents 50 A Note on Specifying Dropout parameters in Survival Studies 994 51 Multiple Comparison Procedures for Survival Data 999 7 xxiv Adaptive Designs 1019 52 Introduction To Adaptive Features 1020 53 The Motivation for Adaptive Sample Size Changes 1027 54 The Cui, Hung and Wang Method 1055 55 The Chen, DeMets and Lan Method 1160 56 Muller and Schafer Method 1221 57 Conditional Power for Decision Making 1350 8 Special Topics 1387 58 Introduction to Volume 8 1388 59 Design and Monitoring of Maximum Information Studies 1393 60 Design and Interim Monitoring with General Endpoints 1423 61 Early Stopping for Futility 1434 62 Flexible Stopping Boundaries in East 1460 63 Confidence Interval Based Design 1493 64 Simulation in East 1552 65 Predictive Interval Plots 1575 66 Enrollment/Events Prediction - At Design Stage (By Simulation) 1609 67 Conditional Simulation 1658 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 68 Enrollment/Events Prediction - Analysis 1675 69 Interfacing with East PROCs 1787 9 1795 Analysis 70 Introduction to Volume 9 1798 71 Tutorial: Analysis 1806 72 Analysis-Descriptive Statistics 1827 73 Analysis-Analytics 1837 74 Analysis-Plots 1854 75 Analysis-Normal Superiority One-Sample 1890 76 Analysis-Normal Noninferiority Paired-Sample 1901 77 Analysis-Normal Equivalence Paired-Sample 1907 78 Analysis-Normal Superiority Two-Sample 1913 79 Analysis-Normal Noninferiority Two-Sample 1926 80 Analysis-Normal Equivalence Two-Sample 1941 81 Analysis-Nonparametric Two-Sample 1956 82 Analysis-ANOVA 1976 83 Analysis-Regression Procedures 1987 84 Analysis-Multiple Comparison Procedures for Continuous Data 2024 85 Analysis-Multiple Endpoints for Continuous Data 2055 86 Analysis-Binomial Superiority One-Sample 2060 xxv <<< Contents * Index >>> Table of Contents xxvi 87 Analysis-Binomial Superiority Two-Sample 2069 88 Analysis-Binomial Noninferiority Two-Sample 2088 89 Analysis-Binomial Equivalence Two-Samples 2106 90 Analysis-Discrete: Many Proportions 2111 91 Analysis-Binary Regression Analysis 2131 92 Analysis- Multiple Comparison Procedures for Binary Data 2180 93 Analysis-Comparison of Multiple Comparison Procedures for Continuous Data- Analysis 2207 94 Analysis-Multiple Endpoints for Binary Data 2211 95 Analysis-Agreement 2216 96 Analysis-Survival Data 2219 97 Analysis-Multiple Comparison Procedures for Survival Data 2240 10 2267 Appendices A Introduction to Volume 10 2269 B Group Sequential Design in East 6 2271 C Interim Monitoring in East 6 2313 D Computing the Expected Number of Events 2334 E Generating Survival Simulations in EastSurv 2345 F Spending Functions Derived from Power Boundaries 2347 G The Recursive Integration Algorithm 2352 H Theory - Multiple Comparison Procedures 2353 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 I Theory - Multiple Endpoint Procedures 2368 J Theory-Multi-arm Multi-stage Group Sequential Design 2374 K Theory - MultiArm Two Stage Designs Combining p-values 2394 L Technical Details - Predicted Interval Plots 2404 M Enrollment/Events Prediction - Theory 2409 N Dose Escalation - Theory 2412 O R Functions 2427 P East 5.x to East 6.4 Import Utility 2478 Q Technical Reference and Formulas: Single Look Designs 2484 R Technical Reference and Formulas: Analysis 2542 S Theory - Design - Binomial One-Sample Exact Test 2605 T Theory - Design - Binomial Paired-Sample Exact Test 2611 U Theory - Design - Simon’s Two-Stage Design 2614 V Theory-Design - Binomial Two-Sample Exact Tests 2617 W Classification Table 2638 X Glossary 2639 Y On validating the East Software 2657 Z List of East Beta Testers 2686 References 2695 Index 2719 xxvii <<< Contents * Index >>> Volume 1 The East System 1 Introduction to Volume 1 2 Installing East 6 3 3 Getting Started 4 Data Editor 7 55 2 <<< Contents * Index >>> 1 Introduction to Volume 1 This volume contains chapters which introduce you to East software system. Chapter 2 explains the hardware and operating system requirements and the installation procedures. It also explains the installation validation procedure. Chapter 3 is a tutorial for introducing you to East software quickly. You will learn the basic steps involved in getting in and out of the software, selecting various test options under any of the endpoints, designing a study, creating and comparing multiple designs, simulating and monitoring a study, invoking the graphics, saving your work in files, retrieving previously saved studies, obtaining on-line help and printing reports. It basically describes the menu structure and the menus available in East software, which is a menu driven system. Almost all features are accessed by making selections from the menus. Chapter 4 discusses the Data Editor menu of East 6 which allows you to create and manipulate the contents of your Case Data and Crossover Data. This menu is in use while working with the Analysis menu as well as with some other features like PIP or Conditional Simulations. These features are illustrated with the help of a simple worked example of a binary endpoint trial. 2 <<< Contents * Index >>> 2 2.1 System Requirements to run East 6 Installing East 6 The minimum hardware/operating system/software requirements for East 6 (standalone version of the software or the East client in case of concurrent version) are listed below: In case of Standalone version and East clients in case of concurrent version, the following operating systems are supported: – Windows 7 (32-bit / 64 bit) – Windows 8 (32-bit / 64 bit) – Windows 8.1 (32-bit / 64-bit) – Windows 10 (32-bit / 64-bit) – All of above for computers with English, European and Japanese versions of Windows. In case of concurrent user version, the following server operating systems are supported: – Windows 7 (32-bit / 64 bit) – Windows 8 (32-bit / 64 bit) – Windows 8.1 (32-bit / 64-bit) – Windows 10 (32-bit / 64-bit) – All of above for computers with English, European and Japanese versions of Windows – Windows Server 2008 (32-bit / 64-bit) – Windows Server 2012 – Citrix ∗ ∗ ∗ ∗ XenApp 6.0 on Windows 2008 XenApp 6.5 on Windows 2008 XenApp 7.6 on Windows 2008 XenApp 7.6 on Windows 2012 Further, East has the following hardware/software requirements: – CPU -1 GHz or faster x86 (32 bit) or x64 (64 bit) processor – Memory - Minimum 1 GB of RAM – Hard Drive - Minimum 5 GB of free hard disk space – Display - 1024 x 768 or higher resolution 2.1 System Requirements to run East 6 3 <<< Contents * Index >>> 2 Installing East 6 – Microsoft .Net Framework 4.0 Full (this will be installed as a part of prerequisites if your computer does not have it) – Microsoft Visual C++ 2010 SP1 (this will be installed as a part of prerequisites if your computer does not have it) Installer 4.5 – Internet Explorer 9.0 or above – A stable internet connection is required during installation so that prerequisites like the – East is compatible and supported with R versions between 2.9.0 to 3.2.3. East may or may not work well with later versions of R. If R is not installed, the ability to include custom R functions to modify specific simulation steps will not be available. The R integration feature is an Add-on to East and is required only to integrate custom R functions with East. But note that this feature doesn’t affect any of the core functionalities of East. 2.2 Other Requirements Users with Windows 7 or above: East uses the font Verdana. Generally Verdana is a part of the default fonts installed by Windows. However, sometimes this font may not be available on some computers, especially if a language other than English has been selected. In such cases, the default fonts need to be restored. To restore fonts, go to Control Panel → Fonts → Font settings. Click the button “Restore default font settings”. This will restore all default fonts including Verdana. Note that this must be done before the first use of East. Users with Windows 8.1 On some computers with Windows 8.1, problems may be observed while uninstalling East, especially if the user has upgraded from the previous version using a patch. This is because of a security update KB2962872 (MS14-037) released by Microsoft for Internet Explorer versions 6, 7, 8, 9, 10 and 11. Microsoft has fixed this issue and released another security update KB2976627 (MS14-051) for Internet Explorer which replaces the old problematic update. So it is recommended that users who are affected by this issue install security update KB2976627 (MS14-051) on their computers. 2.3 Installation IMPORTANT: Please follow the steps below if you are installing a standalone/single user version of East. If you are installing a concurrent version, please refer to the document ”Cytel License Manager Setup.pdf” for detailed installation instructions. 1. Uninstalling Previous VersionsIf any version (including a beta or demo) of East 6 is currently installed on your PC, please uninstall it completely or else the 4 2.3 Installation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 installation of the current version will not proceed correctly. To uninstall the earlier version of East 6, go to the Start Menu and select: All Programs→ Cytel Architect → East 6.x→ Uninstall Or All Programs→ East Architect → Uninstall East Architect depending upon the version installed on your computer. 2. Installing Current Version You will need to be an administrator of your computer in order to perform the following steps. If you do not have administrator privileges on your computer, please contact your system administrator / IT. In order to install East, please follow these steps: (a) If you received an email containing a link for downloading the setup, please follow the link and download the setup. This will be a zipped folder. Unzip this folder completely. (b) In the setup folder, locate the program setup.exe and double-click on it. Follow the instructions on the subsequent windows. 2.4 Installation Qualification and Operational Qualification To perform the installation and operational qualification of East 6, go to the Start Menu and select All Programs→ Cytel Architect → East 6.4→ Installation Qualification (IQ). You will be presented with the following dialog box. It will take a few minutes to complete. At the end of the process, the status of the installation qualification will appear. Press Enter (or any other key) to open the 2.4 Installation Qualification and Operational Qualification 5 <<< Contents * Index >>> 2 Installing East 6 validation log. Similarly, one can run the Operational Qualification (OQ). If the validation is successful, the log file will contain a detailed list of all files installed by East on your computer and other details related to IQ and OQ. If the validation fails, the validation log file will contain detailed error messages. Please contact your system administrator with the log file. IQ (Installation Qualification) script: This script verifies whether the software is completely and correctly installed on the system or not. It does this by checking whether all the software components, XML and DLL files are in place. OQ (Operational Qualification) script: This script runs some representative test cases covering all the major modules/features of East and compares the runtime results to the benchmarks (benchmarks are validated results stored internally in the OQ program). It ensures the quality and consistency of the results in the new version. Manual Examples: In addition to IQ/OQ, if more testing is to be done, refer to the user manual and reproduce the results for some representative examples/modules. The flow of examples is easy to follow. Some examples in the manual require additional files (datasets) which are available to you in the Samples folder. Validation Chapter: There is a chapter in this manual dedicated to describe how every feature was validated within Cytel. Refer to the appendix chapter Y on ”Validating East Software”. This covers validation strategies for all the features available in East 6. 6 2.4 Installation Qualification and Operational Qualification <<< Contents * Index >>> 3 Getting Started East has evolved over the past several years with MS Excel R as the user interface. The East on MS Excel R did not integrate directly with any other Cytel products. Under the Architect platform, East is expected to coexist and integrate seamlessly with other Cytel products such as SiZ, and Compass. Architect is a common platform designed to support various Cytel products. It provides a user-friendly, Windows-standard graphical environment, consisting of tabs, icons, and dialog boxes, with which you can design, simulate and analyze. Throughout the user manual, this product is referred to as East 6. One major advantage of East 6 is the facility for creating multiple designs. This is achieved by giving multiple inputs of the parameters as either comma separated, or in a range such as (a:b:c) with a as the initial value, b as the last value and c as the step size. If you give multiple values for more than one parameter, East creates all possible combinations of the input parameters. This is an immense advancement over earlier versions of East, where you had to create one design at a time. Furthermore, one could not compare different types of designs (e.g., superiority vs. noninferiority designs). Similarly, graphical comparison of designs with different numbers of looks was difficult with earlier versions of East. All such comparisons are readily available in East 6. Another new feature is the option to add assumptions for accruals and dropouts at the design stage. Previously, this was available only for survival endpoint trials, but has been extended to continuous and discrete endpoints in East 6. Information about accrual rates, response lag, and dropouts can be given whether designing or simulating a trial. This makes more realistic, end-to-end design and simulation of a trial possible. Section 3.6 discusses all the above features under the Design menu with the help of a case study, CAPTURE. Simulations help to develop better insight into the operating characteristic of a design. In East 6, the simulation module has now been enhanced to allow fixed or random allocation to treatment and control, and different sample sizes. Such options were not possible with earlier versions of East. Section 3.7 briefly describes the Simulations in East 6. Section 3.8 discusses capability to flexibly monitoring a group sequential trial using the Interim Monitoring feature of East 6. We have also provided powerful data editors to create, view, and modify data. A wide variety of statistical tests are now a part of East 6, which enables you to conduct 7 <<< Contents * Index >>> 3 Getting Started statistical analysis of interim data for continuous, discrete and time to event endpoints. Sections 3.4 and 3.5 briefly describes the Data Editor and Analysis menus in East 6. The purpose of this chapter is to familiarize you with the East 6 user interface. 3.1 Workflow in East In this section, the architecture of East 6 is explained. The logical workflow in which the different parts of the user interface co-ordinate with each other is discussed. The basic structure of the user interface is depicted in the following diagram. Besides the top Ribbon, there are four main windows in East 6 namely, (starting from left), the Library pane, the Input / Output window, the Output Preview window and the Help pane. Note that both, the Library and the Help Pane can be auto-hidden temporarily or throughout the session, allowing the other windows to occupy larger area on the screen for display. Initially, Library shows only the Root node. As you work with East, several nodes corresponding to designs, simulation scenarios, data sets and related analyses can be managed using this panel. Various nodes for outputs and plots are created in the Library, facilitating work on multiple scenarios at a time. The width of the Library window can be adjusted for better readability. The central part of the user interface, the Input / Output window, is the main work area where you can8 3.1 Workflow in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Enter input parameters for design computation create and compare multiple designs, view plots Simulate a design under different scenarios Perform interim analysis on a group sequential design look by look and view the results, receive decisions such as stopping or continuing during the execution of a trial Open a data on which you want to perform analysis, enter new data, view outputs, prepare a report etc. This is the area where the user interacts with the product most frequently. The Output Preview window compiles several outputs together in a grid like structure where each row is either a design or simulation run. This area is in use only when working with Design or Simulations. When the Compute or Simulate button is clicked, all requested design or simulation results are computed and are listed row wise in the Output Preview window: By clicking different rows of interest while simultaneously holding the Ctrl key, either a single or multiple designs can be displayed in the Output Summary in vertical 3.1 Workflow in East 9 <<< Contents * Index >>> 3 Getting Started manner or side-by-side comparison can be done. Note that the active window and the Output Preview can be minimized, maximized, or resized. If you want to focus on the Output Summary, click the icon in the top-right corner of the main window. The Output will be maximized as shown below: Any of the designs/simulations in the Output Preview window can be saved in the Library, as depicted in the following workflow diagram. 10 3.1 Workflow in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Double click any of these nodes and the detailed output of the design will be displayed. This will include all relevant input and output information. Right clicking any design node in the Library will allow you to perform various operations on the design such as interim monitoring and simulation. The Help pane displays the context sensitive help for the control currently under the focus. This help is available for all the controls in the Input / Output window. This pane also displays the design specific help which discusses the purpose of the selected test, the published literature referred while developing it and the chapter/section numbers of this user manual to quickly look-up for more details. This pane can be hidden or locked by clicking the pin in its corner. All the windows and features mentioned above are described in detail with the help of an illustration in the subsequent sections of this chapter. 3.2 A Quick Overview of User Interface Almost all the functionalities of East 6 are invoked by selecting appropriate menu items and icons from the Ribbon. The interface consists of four windows as described 3.2 A Quick Overview of User Interface 11 <<< Contents * Index >>> 3 Getting Started in the previous section and four major menu items. These menu items are: Home. This menu contains typical file-related Windows sub-menus. The Help sub-menu provides access to this manual. Data Editor. This menu will be available once a data set is open, providing several sub-menus used to create, manage and transform data. Design. This menu provides a sub-menu for each of the study designs which can be created using East 6. The study designs are grouped according to nature of the response. The tasks like Simulations and Interim Monitoring are available for almost all the study designs under this menu. Analysis. This menu provides a sub-menu for each of the analysis procedure that can be carried out in East 6. The tests are grouped according to the nature of the response. There are also options for basic statistics and plots. 3.3 Home Menu 3.3.1 File 3.3.2 Importing workbooks from East5.4 3.3.3 Settings 3.3.4 View 3.3.5 Window 3.3.6 Help The Home menu contains icons that are logically grouped under File, Settings, View, Window and Help. These icons can be used for specific tasks. 3.3.1 File Click this icon to create new case data or crossover data. A new workbook or log can also be created. Click this icon to open a saved data set, workbook, or log file. Click this icon to import external files created by other programs. Click this icon to export files in various formats. Click this icon to save the current files or workbooks. Click this icon to save a file or workbook with different name. 12 3.3 Home Menu – 3.3.2 Importing workbooks from East5.4 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.3.2 Importing workbooks from East5.4 East allows the conversion of workbooks previously created in East 5.4 (and above) to be imported into East 6 for further development. In order to open a workbook with the .es5 extension given by previous versions of East, it must first be converted to a file with the .cywx extension that will be recognized by East 6. This is easily accomplished through the Covert Old Workbook utility. Click the to see the location of this utility. icon under Home menu From the Start Menu and select: All Programs→ Cytel Architect → East 6.x→ Convert Old Workbook We can see the following window which accepts East5.4 workbook as input and outputs a workbook of East6. Click the Browse buttons to choose the East 5.4 file to 3.3 Home Menu – 3.3.2 Importing workbooks from East5.4 13 <<< Contents * Index >>> 3 Getting Started be converted and the file to be saved with .cywx extension of East 6 version. To start the conversion, click Convert Workbook. Once complete, the file can be opened as a workbook in East 6 as shown below: In order to convert files from East 5.3 or older versions, open the file in East 5.4, save it with a new name say with a suffix East5.4 and then convert this 5.4 file to 6.x as explained above. To get East 5.4 or any help regarding file conversion, contact Cytel at support@cytel.com. 14 3.3 Home Menu – 3.3.3 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.3.3 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East6. 3.3 Home Menu – 3.3.3 Settings 15 <<< Contents * Index >>> 3 Getting Started The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 16 3.3 Home Menu – 3.3.3 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 3.3 Home Menu – 3.3.3 Settings 17 <<< Contents * Index >>> 3 Getting Started simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 3.3.4 View The View submenu consists of enabling or disabling the Help and Library panes by (un)checking the respective check boxes. 3.3.5 Window The Window submenu contains an Arrange and Switch option. This provides the 18 3.3 Home Menu – 3.3.5 Window <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ability to view different standard arrangements of available windows (Design Input Output, Log, Details, charts and plots) and to switch the focus from one window to another. 3.3.6 Help The Help group provides the following ways to access the East6 documentation: User Manual: Invoke the current East 6 user manual. Tutorial: Invoke the available East 6 tutorials. About East 6: Displays the current version and license information for the installed software. Update License: Use this utility to update the license file which you will be receiving from Cytel. 3.4 Data Editor Menu All submenus under the Data Editor menu are enabled once a new or existing data set is open. The Open command under the Home menu shows the list of items that can be opened: Suppose East 6 is installed in the directory C:/Program Files (x86)/Cytel/Cytel 3.4 Data Editor Menu 19 <<< Contents * Index >>> 3 Getting Started Architect/East 6.4 on your machine. You can find sample datasets in the Samples under this directory. Suppose, we open the file named Toxic from the Samples folder. The data is displayed in the main window under the Data Editor menu as shown: Here the columns represent the variable and the rows are the different records. Placing the cursor on a cell containing data will enable all submenus under the Data Editor menu. The submenus are grouped into three sections, Variable, Data and Edit. Here we can modify and transform variables, perform operations on case data, and edit a case or variable in the data. The icons in the Variable group are: Creates a new variable at the current column position. 20 3.4 Data Editor Menu <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Renames the current variable. Modifies the currently selected variable. Transforms the currently selected variable. Numerous algebraic, statistical functions are available which can be used to transform the variable. This feature can also be used to generate a data randomly from distributions such as Normal, Uniform, Chi-Square etc. The following functions are available in the Data group: Sorts case data in ascending or descending order. Filter cases from the case data as per specified criteria. Converts case data to crossover data. Converts crossover data to case data. Displays case data contents to the log window. For the Edit group the following options are available: Selects a case or variable. Inserts a case or variable. Deletes a case or variable. Navigates to a specified case. 3.5 Analysis Menu The Analysis menu allows access to analytical tests which can be performed in East 6. 3.5.1 Basic Plots 3.5.2 Crossover Plots The tests available in the Analysis menus are grouped according to the nature of the response variable. Click an icon to select the test available in a drop down menu. 3.5 Analysis Menu 21 <<< Contents * Index >>> 3 Getting Started Basic Statistics - This part contains tests to compute basic statistics and frequency distribution from a dataset. Continuous - This part groups analysis tests for continuous response. Discrete - This part groups all analysis tests for discrete response. Events - This group contains tests for time to event outcomes Predict - This group contains different procedures to predict the future course of the trial given the current subject level data or summary data. Refer to chapter 68 for more details. 3.5.1 Basic Plots Bar and pie charts for categorical data. Plots such as area, bubble, scatter plot and normality plots for continuous data. Plots related to frequency distributions such as histogram, stem and leaf plots, cumulative plots. 3.5.2 Crossover Plots This menu provides plots applicable to 2x2 crossover data. Subject plots. Summary plots. Diagnostic plots. All the tests under Analysis menu are discussed in detail under Volume 8 of this manual. 22 3.5 Analysis Menu – 3.5.2 Crossover Plots <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.6 Design Menu 3.6.1 Design Input-Output Window 3.6.2 Creating Multiple Designs 3.6.3 Filter Designs 3.6.4 What is a Workbook? 3.6.5 Group Sequential Design for the CAPTURE Trial 3.6.6 Adding a Futility Boundary 3.6.7 Accrual Dropout Information 3.6.8 Output Details This section discusses with the help of the CAPTURE trial the various East features mentioned so far in this chapter. This was a randomized clinical trial of placebo versus Abciximab for patients with refractory unstable angina. Results from this trial were presented at a workshop on clinical trial data monitoring committees Randomised placebo-controlled trial of abciximab before and during coronary intervention in refractory unstable angina: the CAPTURE study, THE LANCET: Vol 349 - May 17, 1997. Let us design, simulate and monitor the CAPTURE trial using East6. The goal of this study is to test the null hypothesis, H0 , that the Abciximab and placebo arms both have an event rate of 15%, versus the alternative hypothesis, H1 , that Abciximab reduces the event rate by 5%, from 15% to 10%. It is desired to have a 2-Sided test with three looks at the data, a type-1 error, α as 0.05 and a power, (1 − β) as 0.8. We shall start with a fixed sample design and then extend it to group sequential design. In this process, we demonstrate the useful features of Architect one by one. To begin, click Design menu, then Two Samples on the Discrete group, and then click Difference of Proportions. Below the top ribbon, there are three windows: the Input/Output, the Library, and the Help. All these windows are explained in section 3.1 on Workflow of East. Both the Library and the Help can be hidden temporarily or throughout the session. The 3.6 Design Menu 23 <<< Contents * Index >>> 3 Getting Started input window for Difference of Proportions test appears as shown below: The design specific help can be accessed by clicking the design. This help is available for all the designs in East6. 3.6.1 icon after invoking a Design Input-Output Window This window is used to enter various design specific input parameters in the input fields and drop-down options available. Let us enter the following inputs for the CAPTURE Trial and create a fixed sample design. Test Type as 2-Sided, Type I Error as 0.05, Power as 0.8, πc as 0.15 and πt as 0.1. On clicking Compute button, a new row for this design gets added in the Output Preview window. Select this row and click the icon. Rename this design as CAPT-FSD to indicate that it is a fixed sample design for the CAPTURE trial. 24 3.6 Design Menu – 3.6.2 Creating Multiple Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3.6.2 Creating Multiple Designs Before finalizing on any particular study design, the statisticians might want to assess the operating characteristics of the trial under different conditions and over a range of parameter values. For example, when we are working on time-to-event trials, we want to see the effect of different values of hazard ratio on the overall power and duration of the study. East makes it easy to rapidly generate and assess multiple options, to perform sensitivity analysis, and select the optimal plan. We can enter multiple values for one or more input parameters and East creates designs for all possible combinations. These designs can then be compared in a tabular as well as graphical manner. Following are the three ways in which we can enter the multiple values: Comma-separated values: (0.8, 0.9, 0.95) Colon-separated range of values: (0.8 to 0.9 in steps of 0.05 can be entered as 0.8:0.9:0.05) Combined values: (0.7, 0.8, 0.85: 0.95: 0.01) Multiple values can be entered only in the cells with pink background color. Now suppose, we want to create designs for two values of Type I Error, three values of Power and four values of πt : 0.1, 0.2 : 0.3 : 0.05. Without changing other parameters, let us enter these ranges for the three parameters as shown below: On clicking Compute button, East will create 2 × 3 × 4 = 24 designs for the CAPTURE Trial. To view all the designs in the Output Preview window, maximize it 3.6 Design Menu – 3.6.2 Creating Multiple Designs 25 <<< Contents * Index >>> 3 Getting Started from the right-hand top. 3.6.3 Filter Designs Suppose we are interested in designs with some specific input/output values, we can set up a criterion by using Filter functionality by clicking the icon available on the top right corner of Output Preview window. For example, we want to see designs with Sample Size less than 1000 and Type I Error 26 3.6 Design Menu – 3.6.3 Filter Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 equal to 0.05. The qualified designs appear in the Output Preview window as shown below: The filter criteria can be edited or cleared by again clicking the Filter icon. On clearing the above criterion, all the 24 designs are displayed back. Before we proceed, let us first delete these recently created 24 designs, leaving behind CAPT-FSD and then minimize the Output Preview window from the right-hand top. One or more rows in the can be deleted by selecting them and clicking the Use the Ctrl key and mouse click to select specific rows. Use the Shift key and mouse click to select all the rows in the range. Use the combination Ctrl + A to select all the rows. The resulting Output Preview is shown below: icon. It is advisable to save this design or any work which you would like to refer in future in an East Workbook. The next subsection briefly discusses about use of workbooks. 3.6 Design Menu – 3.6.4 What is a Workbook? 27 <<< Contents * Index >>> 3 Getting Started 3.6.4 What is a Workbook? A Workbook is a storage construct managed by East for holding different types of generated outputs. The user designs a trial, simulates it, monitors it at several interim looks, conducts certain analyses, draws plots, etc. All of these outputs can be kept together in a workbook which can be saved and retrieved for further development when required. . Note that a single workbook can also contain outputs from more than one design. Select the design CAPT-FSD in the Output Preview window and click the icon. When a design is saved to the library for the first time, East automatically creates a workbook named Wbk1 which can be renamed by right-clicking the node. Let us name it as CAPTURE. Now this is still a temporary storage which means if we exit out of East without saving it permanently, the workbook will not be available in future. Note that Workbooks are not saved automatically on your computer; they are to be saved by either right-clicking the node in the Library and selecting Save or 28 3.6 Design Menu – 3.6.4 What is a Workbook? <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. In addition, the user will be prompted to save contents of the Library while closing East 6. Many a times, we wish to add some specific comments to a design or any other output window. These comments are useful for future references. One can do that by attaching a Note to any node by selecting it and clicking on the icon. A small window will pop up where comments can be stored. Once saved, a yellow icon against the design node will indicate the presence of a note. If you want to view or remove the note, right click the design node, select Note, and clear the contents. The tabs available on the status bar at the bottom left of the screen can be used to navigate between the active windows of East. 3.6 Design Menu – 3.6.4 What is a Workbook? 29 <<< Contents * Index >>> 3 Getting Started For example, if you wish to return to the design inputs, click the Input button which will take you the latest Input window you worked with. As we proceed further, more such tabs will appear enabling us to navigate from one screen of East to another. 3.6.5 Group Sequential Design for the CAPTURE Trial icon in the Library to modify the Select the design CAPT-FSD and click the design. On clicking this icon, following message will pop up. Click ”Yes” to continue. Let us extend this fixed sample design to a group sequential design by changing the Number of Looks from 1 to 3. It means that we are planning to take 2 interim looks and one final look at the data while monitoring the study. An additional tab named Boundary is added which allows us to enter inputs related to 30 3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the boundary family, look spacing and error spending functions. Let the boundary family be Spending Functions and the alpha spending function, Lan-DeMets with the parameter OF. Click on Compute to create the three-look design and rename it as CAPT-GSD. As you go on creating multiple designs in East, the output preview area can become too busy to manage. Thus, you can also select the designs you are interested in, save them in the workbook and then rename them appropriately. The Output Preview window now looks as shown below: Notice that CAPT-GSD requires 18 subjects more than CAPT-FSD to achieve 80% power. This view gives us the horizontal comparison of two designs. Save the design CAPT-GSD in the workbook. One can also compare these designs in a vertical manner. Select the two designs by clicking on one of them, pressing Ctrl and then clicking on the other one. Next, click 3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial 31 <<< Contents * Index >>> 3 Getting Started the icon. This is the Output Summary window of East which compares the two designs vertically. We can easily copy this display from East to MS Excel and modify/save it further in any other format. To do that, right click anywhere in the Output Summary window, select Copy All option and paste the copied data in an Excel workbook. The table gets pasted as two formatted columns. Let us go back to the input window of CAPT-GSD (select the design and click the icon) and activate the Boundary tab. By default, the boundary values in the table at the bottom of this tab are displayed on Z Scale. We can also view these boundaries on other scales such as: Score Scale, δ Scale and p-value Scale. Let us view the efficacy boundaries for CAPT-GSD on a p-value scale. 32 3.6 Design Menu – 3.6.5 Group Sequential Design for the CAPTURE Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The final p-value required to attain statistical significance at level 0.05 is 0.0463. This is sometimes regarded as the penalty for taking two interim looks at the data. Also observe that, although the maximum sample size for this design is 1384, the expected sample size under alternative that δ = -0.05 is much less, 1183. However, there is very little saving under the null hypothesis that δ = 0. The sample size in this case is 1378. Therefore, it might be beneficial to consider replacing the lower efficacy boundary by a futility boundary. Also, sometimes we might wish to stop a trial early because the effect size observed at an interim analysis is too small to warrant continuation. This can be achieved by using β-spending function and introducing a futility boundary at the design stage. 3.6.6 Adding a Futility Boundary Select the design CAPT-GSD and click icon to edit it. Change the Test Type from 2-Sided to 1-Sided and also the Type I Error from 0.05 to 0.025. Go to Boundary tab and add the futility boundaries by using γ (-2) spending function. 3.6 Design Menu – 3.6.6 Adding a Futility Boundary 33 <<< Contents * Index >>> 3 Getting Started Before we create this design, we can see the error spending chart and the boundaries chart for the CAPTURE trial with efficacy as well as futility boundaries. This gives us a way to explore different boundary families and error spending functions and deciding icon to upon the desired combination before even creating a design. Click the view the Error Spending Chart. 34 3.6 Design Menu – 3.6.6 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the icon to view the Boundaries Chart. The shaded region in light pink corresponds to the critical region for futility and the one in light blue corresponds to the critical region for efficacy. We can also view the boundaries on conditional power scale in presence of a futility boundary. Select the entry named cp deltahat Scale from the dropdown Boundary 3.6 Design Menu – 3.6.6 Adding a Futility Boundary 35 <<< Contents * Index >>> 3 Getting Started Scale. The chart is be updated and the boundaries are displayed on CP scale. Zooming the Charts To zoom into any area of the chart, click and drag the mouse over that area. After clicking Zoom button, click on the plot at the top left corner of the area you want to magnify, keep the mouse button pressed and drag the mouse over the desired area. This draws a rectangle around that area. Now leave the mouse button and East magnifies the 36 3.6 Design Menu – 3.6.6 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 selected area. You can keep doing this to zoom in further. The magnified chart appears as below: Note that after zooming, the Zoom button changes to Reset. When you click it, the plot 3.6 Design Menu – 3.6.6 Adding a Futility Boundary 37 <<< Contents * Index >>> 3 Getting Started is reset back to the original shape. Let us compute the third design for the CAPTURE trial and rename it as CAPT-GSD-EffFut. Save it in the workbook. Click the icon to compare all the three designs side-by-side as explained above. Along with the side-by-side comparison, let us compare the two group sequential designs graphically. Press Ctrl and click on CAPT-FSD. Notice that the remaining two designs are still highlighted which means they are selected and CAPT-FSD is unselected. Now click the icon and select Stopping Boundaries to view the graphical comparison of boundaries of the two designs. 38 3.6 Design Menu – 3.6.6 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 As we can see, the design CAPT-GSD uses an upper efficacy boundary whereas CAPT-GSD-EffFut uses an upper futility boundary. We can turn ON and OFF the boundaries by checking the boxes available in the legends. Before we proceed, let us save this third design in the workbook. We can also create several workbooks in the Library and then compare multiple designs across the workbooks. This is an advantage of working with workbooks in East6. 3.6.7 Accrual / Dropout option for Continuous and Discrete Endpoints In the earlier versions of East, the option to incorporate the accrual and dropout information was available only for tests under time-to-event/survival endpoint. East 6 now provides this option for almost all the tests under Continuous and Discrete endpoints as well. Let us see the use it in CAPTURE trial. Select the design CAPT-GSD-EffFut from the Library and edit it to add the accrual-dropout information. From the Design Parameters tab, add the option Accrual/Dropout Info by clicking on Include Options button. Let the accrual rate be 12 subjects/week. Suppose we expect the response to be observed after 4 weeks from the recruitment. Let us create a design by first assuming that there will not be any dropouts during the course of trial. We will then introduce some dropouts and compare the two designs. After entering the above inputs, click on 3.6 Design Menu – 3.6.7 Accrual Dropout Information 39 <<< Contents * Index >>> 3 Getting Started the icon to see how the subjects will accrue and complete the study. Close the chart, create the design by clicking the Compute button, save it in the workbook CAPTURE and rename it as CAPT-GSD-NoDrp to indicate that there are no dropouts in this design. Notice that in this design, the maximum sample size and maximum number of completers is same as there is no dropout. Let us now introduce dropouts. Suppose there is a 5% chance of a subject dropping out 40 3.6 Design Menu – 3.6.7 Accrual Dropout Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the trial. Notice that the two lines are not parallel anymore because of the presence of dropouts. Click Compute button to create this design. Save the design in the workbook CAPTURE and rename it as CAPT-GSD-Drp. Compare this design with CAPT-GSD-NoDrp by selecting the two designs and clicking on icon Notice the inflation in sample size for CAPT-GSD-Drp. This design will require additional 80 subjects to obtain data on 1455 subjects (1455 completers). Let us now compare all the five designs saved in the workbook. Select them all 3.6 Design Menu – 3.6.7 Accrual Dropout Information 41 <<< Contents * Index >>> 3 Getting Started together and click the icon. The resulting screen will look as shown below: We can see additional quantities in the design CAPT-GSD-Drp. These correspond to 42 3.6 Design Menu – 3.6.7 Accrual Dropout Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the information on total number of completers and the study duration which are computed by taking into account the non-zero response lag and possibility of dropouts. Also notice the trend in Maximum Sample Size across all these designs. We can see that it increases as more constraints are added to the study. But if we see values of Expected Sample Size under null and alternative, there is a significant potential saving. You can also save this output summary window comparing three designs in the library by clicking the 3.6.8 icon Output Details In the earlier part of this chapter, we have seen the design output at two different places: Output Preview (horizontal view) and Output Summary (vertical view). The final step in the East6 design workflow is to see the detailed output in the form of an HTML file. Select the design CAPT-GSD-Drp from the Library and click the icon. Alternatively, one can also double-click on any of the nodes in the Library to see the 3.6 Design Menu – 3.6.8 Output Details 43 <<< Contents * Index >>> 3 Getting Started details. The output details are broadly divided into two panels. The left panel consists of all the input parameters and the right panel consists of all the design output quantities in the tabular format. These tables will be explained in detail in subsequent chapters of this manual. Click the Save icon to save all the work done so far. This is the end of introduction to the Design Menu. The next section discusses another very useful feature called Simulations. 3.7 Simulations in East6 A simulation is a very useful way to perform sensitivity analysis of the design assumptions. For instance - What happens to the power of the study when the δ value is not the same as specified at the design stage? We will now simulate design CAPT-GSD-Drp. Select this design from the library and click the icon. Alternatively, you can right-click this design in the Library, and select Simulate. 44 3.7 Simulations in East6 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The default view of input window for simulations is as shown below: Notice that the value of δ on the Response Generation tab is -0.05. This corresponds to the difference in proportions under the alternative hypothesis. You may either keep this default value for the simulation or change it if you wish to simulate the study with a different value of δ. Let us run some simulations by changing the value of δ. We will run simulations over a range of values for πt , say, 0.1,0.125 and 0.14. Enter the values as shown below: Before running simulations, let us have a quick look at the Simulation Control tab where we can change the number of simulations, save the simulation data in East format or in a csv format and some more useful things. You can manipulate the simulations with the following actions: Enter the number of simulations you wish to run in the ”Number of Simulations” field. The default is 10000 simulations. Increase/ Decrease the ”Refresh Frequency” field to speed up or slow down the simulations. The default is to refresh the screen after every 1000 simulations. Set the Random Number Seed to Clock or Fixed. The default is Clock. Select the checkbox of ”Suppress All Intermediate Output” to suppress the intermediate output. 3.7 Simulations in East6 45 <<< Contents * Index >>> 3 Getting Started To see the intermediate results after a specific number of simulations, select the checkbox of ”Pause after Refresh” and enter the refresh frequency accordingly. The checkbox of ”Stop At End” is selected by default to display the summary results at the end of all the simulations a corresponding item gets created in the Output Preview window. One can uncheck this box and save the simulation node directly in the Output Preview window. One can also save the summary statistics for each simulation run and the subject level simulated data in the form of a Case Data or a Comma Separated File. Select the checkboxes accordingly and provide the file names and paths while using the CSV option. If you are saving the data as Case Data, the corresponding data file will be associated with the simulation node. It can be accessed by saving the simulation node from Output Preview to the workbook in Library. For now, let us keep the Simulation Control tab as shown below: Click the Simulate button on right hand side bottom to run the simulations. Three scenarios corresponding to three values of πt are simulated one after the other and in the end, the following output window appears. This is the Simulation Intermediate Output window which shows the results from last simulated scenario. The two plots 46 3.7 Simulations in East6 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 on this window are useful to see how the study performed over 10000 simulations. Click the Close button on this intermediate window which takes us to the Output Preview window. Save these three simulation rows in the workbook CAPTURE. Since we simulated the design CAPT-GSD-Drp, the three simulation nodes get saved as child nodes of this design. This is the hierarchy which is followed throughout the East6 software. A full picture of the CAPTURE trial design with accrual/dropout information and its simulations can be viewed easily. Select the three simulation nodes and the parent 3.7 Simulations in East6 47 <<< Contents * Index >>> 3 Getting Started design node in the Library and click the icon. Note the drop in simulated power as the difference between the two arms decreased. This is because, the sample size of 1532 was insufficient to detect the δ value -0.025 and -0.01. It shows the effect of mis-specifying the alternative hypothesis. It did achieve the power of 80% for the first case with δ equal to -0.05 which was actually the δ at the design stage. This is called simulating the design under Alternative. We can also simulate a design under Null by entering πt equal to 0.15, same as πc and verify that the type I error is preserved. The column width is the comparison mode is fixed and the heading appears in the format workbook name:design name:Sim. If this string is longer than the fixed width then you may not be able to see the complete heading. In that case, you can hover the mouse on cell of column heading to see the complete heading. 48 3.7 Simulations in East6 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Thus, simulations in East6 are one of the very powerful tools which help us to verify the operating characteristics of the design. The next section introduces us to another key feature of East6 - Interim Monitoring. Let us see monitor the CAPTURE Trial using this feature. 3.8 Interim Monitoring Interim monitoring is critical for the management of group sequential trials, and there are many reasons why flexibility in both design and monitoring is necessary. Administrative schedules may call for the recalculation of statistical information and unplanned analyses at arbitrary time points, while the need for simultaneously preserving both the type-1 error and power of the study must be maintained. East provides the capability to flexibly monitor a group sequential trial using the Interim Monitoring. The IM dashboard provides a coherent visual display of many output values based on interim information. In addition to important statistical information, included are tables and graphs for stopping boundaries, conditional power, error spending and confidence intervals for each interim look. All of this information is useful in tracking the progress of a trial for decision making purposes, as well as allowing for improvements to a study design adaptively. Consider the monitoring of CAPT-GSD-Drp of the CAPTURE trial. Select this design from the Library and click the icon. The adaptive version of IM dashboard can be invoked by clicking the icon. But for this example, we will use regular IM dashboard. A node named Interim Monitoring gets associated with the design in the Library and a 3.8 Interim Monitoring 49 <<< Contents * Index >>> 3 Getting Started blank IM dashboard is opened up as shown below: Suppose we have to take the first look at the data based on 485 completers. The interim data on these subjects is to be entered in Test Statistic Calculator which can be opened by clicking OK with default parameters. 50 3.8 Interim Monitoring button. Open this calculator and click <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 If we have run any analysis procedure on the interim data then the test statistic calculator can read in the information from Analysis node. Select the appropriate workbook and the node and hit Recalc to see the interim inputs. Alternatively, for binomial endpoint trials, we can enter the interim data in terms of the number of responses on each arm and East computes the difference in proportion and its standard error. Alternatively, we can directly enter the and its standard error which can be the output of some external computation. The inputs on the test statistic calculator depend upon the type of trial you are monitoring. 3.8 Interim Monitoring 51 <<< Contents * Index >>> 3 Getting Started The resulting screen is as shown below: The output quantities for the first look are computed in that row and all the four charts are updated based on the look1 data. There some more advanced features like Conditional Power calculator, Predicted Intervals Plot, Conditional Simulations available from the IM dashboard. These are explained in later sections of this manual. Let us take the second look at 970 subjects. Open the test statistic calculator and leaving all other parameters default, change the number of responses on Treatment arm 52 3.8 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 to 30. Click the OK button. The screen will look as shown below: East tells us that the null hypothesis is rejected at second look, provides an option to stop the trial and conclude efficacy of the drug over the control arm. It computes the final inference in the end. At this stage, it also provides another option to continue entering data for future looks. But the final inference is computed only once. In the last part of this chapter, we shall see how to capture a snapshot of any ongoing interim monitoring of a trial. The IM dashboard can also be used as a tool at design time, where we can construct and analyze multiple possible trial scenarios before actual data is collected. The feature to save a snapshot of information at interim looks can be employed to allow a user the benefit of quickly comparing multiple scenarios under a variety of assumptions. This option increases the flexibility of both, design and interim monitoring process. At each interim look, a snapshot of the updated information in the dashboard can be saved for the current design in the workbook. icon located at the top of IM Dashboard window to save the current Click the contents of the dashboard: A new snapshot node is added under the Interim Monitoring node in the library. The Interim Monitoring window is the input window which can’t be printed whereas it 3.8 Interim Monitoring 53 <<< Contents * Index >>> 3 Getting Started snapshot is in the HTML format which can be printed and shared. To illustrate the benefit of the snapshot feature, it is often the case that actual trial data is not available at design time. Determining a reasonable estimate of nuisance parameters, such as the variance, rather than making strong assumptions of its certainty may be desired. The ability to quickly compare potential results under a variety of different estimates of the variance by easily looking at multiple interim snapshots of a study can be a powerful tool. Other examples could include sample size re-estimation where initial design assumptions may be incorrect or using hypothetical interim data to compare relevant treatment differences. With this, we come to an end of the chapter on getting started with East6. The subsequent chapters in this manual discuss in detail with the help of case studies all the features available in the software. The theory part of all the design and analysis procedures is explained in Appendix A of this manual. 54 3.8 Interim Monitoring <<< Contents * Index >>> 4 Data Editor Data Editor allows you to manipulate the contents of your data. East caters to Case Data and Crossover Data. Depending on the type of data, a corresponding set of menu items becomes available in the Data Editor menu. 4.1 Case Data 4.1.1 Data Editor Capabilities for Case Data 4.1.2 Creating Variables 4.1.3 Variable Type Setting 4.1.4 Editing Data 4.1.5 Filter Cases The Data editor window for case data is a spreadsheet-like facility for creating or editing case data files. A case data file is organized as a sequence of records called cases one below the other. Each record is subdivided into a fixed number of fields, called variables. The name assigned to that field is referred to as the variable name. Each such name identifies a specific variable across all the cases. Each cell holds a value of a variable for a case. The top line of the Data editor holds the variable names. Case data is the most common format to enter and store data. If you plan to share data with any other package you need to use case data editor. 4.1.1 Data Editor Capabilities for Case Data The Data Editor is used to create a new Case Data file or to edit one that was previously saved. You can: Create new variables Change names and attributes of existing variables Alter the column width Alter the row height Type in new case data records Edit existing case data records Insert new variables into the data set Remove variables from the data set Select or reject subsets of the data Transform variables List data in the log window Calculate summary measures from the variables 4.1.2 Creating Variables To create a new Case Data set, invoke the menu Home. Click on the icon Select Case Data. When you create a new case data set, all the columns are labeled var, indicating that new variables may be created in any of the columns. To create a new variable simply start entering data in a blank column. The column is given a default name Var1, Var2, etc. Alternatively, select any unused column, right click and select Create Variable from the menu that appears. The data editor will create all the variables with default names up to the column you are working on. To create a new 4.1 Case Data – 4.1.2 Creating Variables 55 <<< Contents * Index >>> 4 Data Editor variable in the first unused column and to select its attributes, choose menu Data Editor. Click on the icon You will be presented with the dialog box shown below, in which you can select the variable name, variable type, alignment, format, value labels and missing values. 4.1.3 Variable Type Setting You can change the default variable name and its type in this dialog box and click on the OK button. East will automatically add this new variable to the case data file. New variables are added immediately adjacent to the last existing variable in the case data set. The Variable Type Setting dialog box contains five tabs: Detail, Alignment, Format, Value Label, and Missing Value(s). Detail The Detail tab allows you to change the default variable name, add a description of the variable and select the type (Numeric, String, Date, Binary, Categorical or Integer). Note that depending on the type of the variable, different tabs and options become available in 56 4.1 Case Data – 4.1.3 Variable Type Setting <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Variable Type Settings. For example the tab Category Details and option Base Level become available only if you select Variable Type as Categorical. Value Label The Value label tab is displayed below. Here, you can add labels for particular data values, change a selected label, remove a selected label, or remove all value labels for the current variable. Missing Value(s) The Missing Value(s) tab is used for specifying which values are to be treated as missing. You have three choices: Not Defined, which means that no values will be treated as missing values; Discrete value(s), which allows you to add particular values to the list of missing values; or Range, which lets you specify an entire range of numbers as missing values. 4.1.4 Editing Data Besides changing the actual cell entries of a case data set you can: Add new Cases and Variables Insert or delete Cases and Variables 4.1 Case Data – 4.1.4 Editing Data 57 <<< Contents * Index >>> 4 Data Editor Sort Cases 4.1.5 Filter Cases We illustrate the ability of East to filter cases with the help of the following example: Step 1: Open the Data set Open the data set leukemia.cyd by clicking on menu Home. Click on the icon Select Data. The data is stored in the Samples folder of the installation directory of East. Step 2: Invoke the Filter Cases menu Invoke the menu item Data Editor. Click on the icon Filter cases. East will present you with a dialog box that allows you to use subsets of data in the Case Data editor. The dialog box will allow you to select All cases, those satisfying an If condition, falling in a Range, or using a Filter Variable as shown below. Step 3: Filter Variable option Select the Filter Variable option. Select Status from the variable list and click on the black triangle, which will remove the variable Status from the variable list and add it to the empty box on the other side. Suppose we want to filter the cases for which the Status variable has value 1. Insert 58 4.1 Case Data – 4.1.5 Filter Cases <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1 in the empty box next to Event code. Step 4: Output Click on OK . As shown in the following screenshot, East will grey out all the cases that have Status variable value 1. Now any analysis carried out on the data set uses only the filtered cases. In this way you, can carry out subgroup analyses if the 4.1 Case Data – 4.1.5 Filter Cases 59 <<< Contents * Index >>> 4 Data Editor subgroups are identified by the values of a variable in the data set. 60 4.1 Case Data – 4.1.5 Filter Cases <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4.2 Crossover Data 4.2.1 Data Editor Capabilities for Crossover Data 4.2.2 Creating a New Crossover Data Set The Data Editor allows you to enter data for a 2 × 2 crossover trial with one record for each patient. You can use this crossover data editor to input individual patients’ responses in 2 × 2 crossover trials. The response could be continuous (such as systolic blood pressure) or binary (such as the development of a tumor after injecting a carcinogenic agent). Only the continuous response type is currently supported in East. 4.2.1 Data Editor Capabilities for Crossover Data The Data Editor is used to create a new 2 × 2 Crossover Data file or to edit one that was previously saved. You can: Create and edit data with continuous response of individual patients. Edit period labels. Assign treatments to different groups and periods. Convert to case data. Convert case data into crossover data. List data to the log 4.2.2 Creating a New Crossover Data Set To create a new crossover data set, invoke the menu Home. Click on icon from the drop down menu choose Crossover data. You will be presented with a dialog box as shown below: and In the above dialog box, you see a 2 × 2 grid called Treatment Assignment Table. This grid is provided to assign the treatments to different groups and periods. 4.2 Crossover Data – 4.2.2 Creating a New Crossover Data Set 61 <<< Contents * Index >>> 4 Data Editor In this version of the software, you can analyze data for 2 × 2 crossover trials. Hence the number of groups and number of periods are always two. The rows specify the two groups labeled as G1 and G2. The columns represent two periods of the crossover data labeled ”P1” and ”P2”. If you’d like to change these labels, click inside the table cells. Type the treatment names associated with the corresponding group and period. Having entered the treatments, the crossover data editor settings dialog box will look as follows: Rules for editing these fields The row names G1 and G2 can be changed using a string consisting of a maximum of 8 characters from the set A-Z, 0-9, ’.’, ’ ’ (underscore), starting with either a letter or a digit; blank spaces are not accepted as part of a name. The column names P1 and P2 can be changed the same way. Also note that the Group names as well as the Period names must be distinct. The letters are not case sensitive. Once you have assigned all the treatments, click on the button OK . 62 4.2 Crossover Data – 4.2.2 Creating a New Crossover Data Set <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will open up the Patients’ crossover data editor. This editor resembles the case data editor. Like the case data editor, this is a spreadsheet into which you can enter data directly. There are four pre-defined fields in this editor. The PatientId column must contain the Patients’ identification number. The GroupId column will contain the group identification to which the patient belongs. The entry in this column should be one of the labels that you have entered as row names in the 2 × 2 grid earlier. The inputs in the next two columns are numeric and contain the responses of the patient in two periods respectively. The title of the next two columns is created by concatenating the word ”Resp” to the period identifications that you have entered previously. For example, here in the setting dialog we have entered P1 and P2 as period identifiers and these two response columns are labeled as P1 Resp and P2 Resp. However, if the period values are starting with digits such as 1 and 2, then the period ids are prefixed by the letter P, and the heading of the next two columns would be P1 Resp and P2 Resp. The variable names PatientId, GroupId, are fixed and cannot be edited in the data editor. If you use Transform Variable on Group Id and the result is either ”G1” or ”G2,” then the value is displayed; otherwise, the value is shown as missing. You can also add covariates such as age and sex. All variable settings of the case data editor are applicable to these covariates. The Settings button allows you to edit the GroupId, PeriodId or the treatment labels that you have edited earlier. If you make any changes, these changes will automatically be made in the data editor. 4.3 Data Transformation You can transform an existing variable with the data transformation facility available in the Data Editor of East . 4.3 Data Transformation 63 <<< Contents * Index >>> 4 Data Editor To transform any variable: 1. Select the menu Data Editor. Click on the icon You will be presented with the expression builder dialog box screen. Here you can transform the values of the current variable using a combination of statistical, arithmetic, and logical operations. The current variable name is the target variable on the left hand side of an equation with the form: VAR = Where, VAR is the variable name of the current variable. In order to create a new variable, type the variable name in the target variable field. 2. Complete the right hand side of the equation with any combination of allowable functions. To select a function, double-click on it. If the function that you select needs any extra parameters (typically variable names), this will be indicated by a ? for each required parameter. Replace the ? character with the desired parameter. 3. Select the OK button to fill in values for the current variable computed according to the expression that you have constructed. The statistical, arithmetical, and logical functions that are available in the Transform Variable dialog box are given below: 64 4.3 Data Transformation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4.4 Mathematical and Statistical Functions The following is a list of mathematical and statistical functions available in East used for variable transformation. ABS(X) Returns the absolute value of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . ACOS(X) Returns the arccosine of X. Argument Range: −1 ≤ X ≤ 1. ASIN(X) Returns the arcsine of X. Argument Range: −1 ≤ X ≤ 1. ATAN(X) Returns the arctangent of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . AVG(X1 , X2 , . . .) Returns the mean of (X1 , X2 , . . .). Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . CEIL(X) Returns the ceiling, or smallest integer greater than or equal to X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . CHIDIST(X,df) Returns the probability in the tail area to the left of X from the chi-squared distribution with df > 0 degrees of freedom. Argument Range: 0 ≤ X ≤ 1 × 1025 . CHIINV(X,df) Returns the Xth percentile value of the chi-squared distribution with d > 0 degrees of freedom, i.e., returns z such that Pr(Z ≤ z) = X. Argument Range: 0.0001 ≤ X ≤ 0.9999. COS(X) Returns the cosine of X, where X is expressed in radians. Argument Range: −2.14 × 109 ≤ X ≤ 2.14 × 109 . COSH(X) Returns the hyperbolic cosine of X. Argument Range: −87 ≤ X ≤ 87. CUMULATIVE(X) Given a column of X values this function returns a new column in which the entry in row j is the sum of entries in the first j rows of the original column. EXP(X) Returns the exponential function evaluated at X. Argument Range: −87 ≤ X ≤ 87. FLOOR(X) Returns the floor, or largest integer less than or equal to X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . INT(X) Returns the integer part of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . ISNA(X) Returns a value of 1 if X is a missing value 0 otherwise. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . This function is useful. For example, set missing observations to average values. X1 = IF(ISNA(X)=1, COLMEAN(X), X) Another extremely useful task performed by the ISNA() function is to eliminate records from the data set in which there are missing values. 4.4 Mathematical and Statistical Functions 65 <<< Contents * Index >>> 4 Data Editor REJECTIF(ISNA(X)=1) ←- Enter SELECTIF(ISNA(V1)+ISNA(V2)+ISNA(V3)=0) ←- Enter LOG(X) Returns the logarithm of X to base 10. Argument Range: 1 × 10−25 ≤ X ≤ 1 × 1025 . . LN(X) Returns the logarithm of X to base e. Argument Range: 1 × 10−25 ≤ X ≤ 1 × 1025 . MAX(X1 , X2 , . . .) Returns the maximum value of (X1 , X2 , . . .). MIN(X1 , X2 , . . .) Returns the minimum value of (X1 , X2 , . . .). MOD(X,Y) Returns the remainder of X divided by Y. The sign of this remainder is the same as that of X. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . NORMDIST(X) Returns the probability in the tail area to the left of X from the standardized normal distribution. Argument Range: −10 ≤ X ≤ 10. NORMINV(X) Returns the Xth percentile value of the standard normal distribution, i.e., returns z such that Pr(Z ≤ z) = X. Argument Range: 0.001 ≤ X ≤ 0.999. ROUND(X,d) Returns a floating point number obtained by rounding X to d decimal digits. If d=0, X is rounded to the nearest integer. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 . SIN(X) Returns the sine of X, where X is expressed in radians. Argument Range: −2.14 × 109 ≤ X ≤ 2.14 × 109 . SINH(X) Returns the hyperbolic sine of X. Argument Range: −87 ≤ X ≤ 87. SQRT(X) Returns the square root of X. Argument Range: 0 ≤ X ≤ 1 × 1025 . TAN(X) Returns the tangent of X, where X is expressed in radians. Argument Range: −1 × 1025 ≤ X ≤ 1 × 1025 ; X 6= (2n + 1) π2 , n an integer. TANH(X) Returns the hyperbolic tangent of X. Argument Range: −87 ≤ X ≤ 87. 4.4.1 The IF Function This function tests arithmetic or logical condition and returns one value if true, another value if false. The syntax is IF(CONDITION, X, Y) The function returns the value X if CONDITION is ”true” and Y if CONDITION is ”false”. For example consider the following equation: HIVPOS = IF(CD4>1,1,-1) 66 4.4 Mathematical and Statistical Functions – 4.4.1 The IF Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The above equation defines a variable HIVPOS that assumes the value 1, if the variable CD4 exceeds 1 and assumes the value -1 otherwise. Usually CONDITION is made up of two arithmetic expressions separated by a ”comparison operator”, e.g., CD4>CD8, CD4+CD8=15*BLOOD, etc. The following comparison operators are allowed: = , >, <, >=, <=, <> More generally, CONDITION can be constructed by combining two or more individual conditions with AND, OR, or NOT operators. For example consider the following expression HIVPOS = IF((CD4>1) !AND! (CD8>1), 1,-1) The above expression means that HIVPOS will take on the value 1 if both CD4>1 and CD8>1, and -1 otherwise. On the other hand consider the following expression: HIVANY = IF((CD4>1) !OR! (CD8>1),1,-1) The above expression means that HIVANY will take on the value 1 if either CD4>1 or CD8>1 and -1 otherwise. 4.4.2 The SELECTIF Function This function provides a powerful way of selecting only those records that satisfy a specific arithmetic or logical condition. All other records are deleted from the current data set. The syntax is: SELECTIF(CONDITION) This function selects only those records for which CONDITION is ”true” and excludes all other records from the current dataset. For example consider the following equation: HIVPOS = SELECTIF(CD4>1) The above condition retails records for which CD4 exceeds 1. The same rules governing CONDITION for the IF function are applicable here as well. Note that the column location of the cursor when Transform Variable was selected plays no role in the execution of this function. 4.4.3 The RECODE Function This function recodes different ranges of a variable. It is extremely useful for creating a new variable consisting of discrete categories at pre-specified cut-points of the original variable. The syntax for RECODE has two forms — one for recoding a 4.4 Mathematical and Statistical Functions – 4.4.3 The RECODE Function 67 <<< Contents * Index >>> 4 Data Editor categorical variable and one for recoding a continuous variable. In both cases, the variable being recoded must assume numerical values. Recoding a Categorical Variable syntax is: RECODE(X, S1 = c1 , S2 = c2 , . . . , Sn = cn , [else]) , where X is the categorical variable (or arithmetic expression) being recoded, Sj represents a set of numbers in X, all being recoded to cj , and the optional argument [else] is a default number to which all the numbers belonging to X, but excluded from the sets S1 , S2 , . . . Sn , are recoded. If [else] is not specified as an argument of RECODE, then all the numbers excluded from the sets S1 , S2 , . . . , Sn are unchanged. Notice that the argument Sj = cj in the RECODE function consists of a set of numbers Sj being recoded to a single number cj . The usual mathematical convention is adopted of specifying a set of numbers within braces. Thus if set Sj consisted of m distinct numbers s1j , s2j , . . . , smj , it would be represented in the RECODE argument list as {s1j , s2j , . . . , smj }. For example Y = RECODE(X, {1,2,3}=1, {7,9}=2, {10}=3) will recode the categorical variable X into another categorical variable Y that assumes the value 1 for X ∈ {1, 2, 3}, 2 for X ∈ {7, 9}, and 3 for X = 10. Other values of X, if any, remain unchanged. If you want those other values of X to be recoded to, e.g.,-1, simply augment the argument list by including -1 at the end of the recode statement: Y = RECODE(X, {1,2,3}=1, {7,9}=2, {10}=3, -1) . Recoding a Continuous Variable syntax is: RECODE(X, I1 = c1 , I2 = c2 , . . . , In = cn , [else]) where X is the continuous variable (or arithmetic expression) being recoded, Ij represents an interval of numbers all being recoded to cj , and the optional argument [else] is a default number to which all the numbers belonging to X, but excluded from the intervals I1 , I2 , . . . In , are recoded. If [else] is not specified as an argument of RECODE, then all the numbers excluded from the intervals I1 , I2 , . . . , In are unchanged. Notice that the arguments of RECODE are intervals being recoded to individual numbers. The usual mathematical convention for specifying an interval Ij as open, semi-open, and closed is adopted. Thus: An interval Ij of the form (u, v) is open and includes all numbers between u and v, but not the end points. 68 4.4 Mathematical and Statistical Functions – 4.4.3 The RECODE Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 An interval Ij of the form [u, v] is closed and includes all numbers between u and v inclusive of the end points. An interval of the form (u, v] is open on the left but closed on the right. It excludes u, includes v, and includes all the numbers in between. An interval of the form [u, v) is closed on the left but open on the right. It includes u, excludes v, and includes all the numbers in between. For example Y = RECODE(X, (2.5,5.7]=1, (5.7,10.4]=2) will recode the continuous variable X so that all numbers 2.5 < X ≤ 5.7 are replaced by 1, all numbers 5.7 < X ≤ 10.4 are replaced by 2, and all other values of X are unchanged. If you want all other values of X to also be recoded to say -1, append the -1 as the last argument of the equation: Y = RECODE(X, (2.5,5.7]=1, (5.7,10.4]=2, -1) . 4.4.4 Column Functions Column functions operate on an entire column of numbers and return a scalar quantity. The returned value is often used in arithmetic expressions. The following column functions are available. All of them are prefixed by the letters COL. The argument var of all these column functions must be a variable in the worksheet; arithmetic expressions are not permitted. This may require you to create an intermediate column of computed expressions before using a column function. Also note that missing values are ignored in computing these column functions. COLMEAN(X) Returns the sample mean of X. COLVAR(X) Returns the sample variance of X. COLSTD(X) Returns the sample standard deviation of X. COLSUM(X) Returns the sum of all the numbers in X. COLMAX(X) Returns the maximum value of X. COLMIN(X) Returns the minimum value of X. COLRANGE(X) Returns the value of COLMAX(X)-COLMIN(X). COLCOUNT(X) Returns the number of elements in X. You can use the values returned by these column functions in arithmetic expressions and as arguments of other functions. To do this, it is not necessary to know the actual value returned by the column function. However, if you want to know the value returned by any column function, you must define a new variable in the worksheet and fill its entire column with the value of the column function. 4.4.5 Random Numbers 4.4 Mathematical and Statistical Functions – 4.4.5 Random Numbers 69 <<< Contents * Index >>> 4 Data Editor You can fill an entire column of a worksheet with random numbers and constants. Suppose the cursor is in a cell of a variable named RANDNUM. The expression RANDNUM = #RAND will result in the variable RANDNUM being filled with a column of uniform random numbers in the range (0, 1). Three random number functions or generators are available to you with the editors: #RAND Generates uniform random numbers in the range (0, 1). #NORMRAND Generates random numbers from the standard Normal Distribution. #CHIRAND(X) Generates random numbers from the chi-squared distribution with X degrees of freedom. You may of course use these three random number generators to generate random numbers from other distributions. For example, the equation Y = 3+2*#NORMRAND will generate random numbers from the normal distribution with mean 3 and standard deviation 2, in variable Y. Again, the equation Z = #CHIRAND(5) will generate random numbers from the chi-squared distribution with 5 degrees of freedom. 4.4.6 Special functions The following special functions are available for use in arithmetic expressions: #PI This is the value of π. #NA This is the missing value code. It can be used to detect if a value is missing, or to force a value to be treated as missing. #SQNO This is the value of the current sequence number (SQNO) in the current data set. #SQEND This is the largest value of the sequence number (SQNO) in the current data set. 70 4.4 Mathematical and Statistical Functions – 4.4.6 Special functions <<< Contents * Index >>> Volume 2 Continuous Endpoints 5 Introduction to Volume 2 73 6 Tutorial: Normal Endpoint 79 7 Normal Superiority One-Sample 91 8 Normal Noninferiority Paired-Sample 113 9 Normal Equivalence Paired-Sample 10 Normal Superiority Two-Sample 128 141 11 Nonparametric Superiority Two Sample 12 Normal Non-inferiority Two-Sample 13 Normal Equivalence Two-Sample 14 Normal: Many Means 179 185 211 232 15 Multiple Comparison Procedures for Continuous Data 16 Multiple Endpoints-Gatekeeping Procedures 265 240 <<< Contents * Index >>> 17 Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs 285 18 Two-Stage Multi-arm Designs using p-value combination 19 Normal Superiority Regression 72 332 309 <<< Contents * Index >>> 5 Introduction to Volume 2 This volume describes the procedures for continuous endpoints (normal) applicable to one-sample, two-samples, many-samples and regression situations. All the three type of designs - superiority, non-inferiority and equivalence are discussed in detail. Chapter 6 introduces you to East on the Architect platform, using an example clinical trial to test difference of means. Chapter 7, 8 and 9 detail the design and interim monitoring in one-sample situation where it may be required to compare a new treatment to a well-established control, using a single sample. These chapters respectively cover superiority, non-inferiority and equivalence type of trials. Chapter 10 details the design and interim monitoring in superiority two-sample situation where the superiority of a new treatment over the control treatment is tested comparing the group-dependent means of the outcome variables. Chapter 11 details the design in the Wilcoxon-Mann-Whitney nonparametric test which is a commonly used test for the comparison of two distributions when the observations cannot be assumed to come from normal distributions. It is used when the distributions differ only in a location parameter and is especially useful when the distributions are not symmetric. For Wilcoxon-Mann-Whitney test, East supports single look superiority designs only. Chapter 12 provides an account of the design and interim monitoring in non-inferiority two-sample situation where the goal is to establish that an experimental treatment is no worse than the standard treatment, rather than attempting to establish that it is superior. Non-inferiority trials are designed by specifying a non-inferiority margin. The amount by which the mean response on the experimental arm is worse than the mean response on the control arm must fall within this margin in order for the claim of non-inferiority to be sustained. Chapter 13 narrates the details of the design and interim monitoring in equivalence two-sample situation where the goal is neither establishing superiority nor non-inferiority, but equivalence. When the goal is to show that two treatments are similar, it is necessary to develop procedures with the goal of establishing equivalence in mind. In Section 13.1, the problem of establishing the equivalence with respect to the difference of the means of two normal distributions using a parallel-group design is presented. The corresponding problem of establishing the equivalence with respect to 73 <<< Contents * Index >>> 5 Introduction to Volume 2 the log ratio of means is presented in Section 13.2. For the crossover design, the problem of establishing the equivalence with respect to the difference of the means is presented in Section 13.3 and with respect to the log ratio of means in Section 13.4. Chapter 16 details the clinical trials that are often designed to assess benefits of a new treatment compared to a control treatment with respect to multiple clinical endpoints which are divided into hierarchically ordered families. It discusses two methods Section 16.2 discusses Serial Gatekeeping whereas section 16.3 discusses Parallel Gatekeeping. Chapter 14 details the various tests available for comparing more than two continuous means in East. Sections 14.1, 14.2 and 14.3 discuss One Way ANOVA, One Way Repeated Measures ANOVA and Two Way ANOVA respectively. Chapter 15 details the Multiple Comparison Procedures (MCP) for continuous data. It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Multiple comparison (MC) procedures provides a guard against inflation of type I error while testing these multiple hypotheses. East supports several parametric and p-value based MC procedures. This chapter explains how to design a study using a chosen MC procedure that strongly maintains FWER. Chapter 19 elaborates on the design and interim monitoring in superiority regression situation where linear regression models are used to examine the relationship between a response variable and one or more explanatory variables. This chapter discusses the design and interim monitoring of three types of linear regression models. Section 19.1 examines the problem of testing a single slope in a simple linear regression model involving one continuous covariate. Section 19.2 examines the problem of testing the equality of two slopes in a linear regression model with only one observation per subject. Finally Section 19.3 examines the problem of testing the equality of two slopes in a linear regression repeated measures model, applied to a longitudinal setting. 74 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5.1 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 5.1 Settings 75 <<< Contents * Index >>> 5 Introduction to Volume 2 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 76 5.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 5.1 Settings 77 <<< Contents * Index >>> 5 Introduction to Volume 2 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 78 5.1 Settings <<< Contents * Index >>> 6 Tutorial: Normal Endpoint This tutorial introduces you to East on the Architect platform, using an example clinical trial to test difference of means. 6.1 Fixed Sample Design When you open East, by default, the Design tab in the ribbon will be active. The items on this tab are grouped under the following categories of endpoints: Continuous, Discrete, Count, Survival, and General. Click Continuous: Two Samples, and then Parallel Design: Difference of Means. The following input window will appear. By default, the radio button for Sample Size (n) is selected, indicating that it is the variable to be computed. The default values shown for Type I Error and Power are 0.025 and 0.9. Keep the same for this design. Since the default inputs provide all of the necessary input information, you are ready to compute sample size by clicking the Compute button. The calculated result will appear in the Output Preview pane, as 6.1 Fixed Sample Design 79 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint shown below. This single row of output contains relevant details of inputs and the computed result of total sample size (and total completers) of 467. Select this row, and click display a summary of the design details in the upper pane (known as Output Summary). to The discussion so far gives you a quick feel of the software for computing sample size for a single look design. We will describe further features in an example for a group sequential design in the next section. 6.2 Group Sequential Design for a Normal Superiority Trial 6.2.1 Study Background Drug X is a newly developed lipase inhibitor for obesity management that acts by inhibiting the absorption of dietary fats. The performance of this drug needs to be compared with an already marketed drug Y for the same condition. In a randomized, 80 6.2 Group Sequential Design – 6.2.1 Study Background <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 double-blind, trial comparing the efficacy and safety of 1 year of treatment with X to Y (each at 120 mg for three times a day), obese adults are to be randomized to receive either X or Y combined with dietary intervention for a period of one year. The endpoint is weight loss (in pounds). You are to design a trial having 90% power to detect a mean difference of 9 lbs between X and Y, assuming 15 lbs and 6 lbs weight loss in each treatment arm, respectively, and a common standard deviation of 32 lbs. The design is required to be a 2-sided test at the 5% significance level. From the design menu choose Continuous: Two Samples, and then Parallel Design: Difference of Means. Select 2-Sided for Test Type, and enter 0.05 for Type I Error. Specify the Mean Control be 6, the Mean Treatment to be 15, and the common Std. Deviation to be 32. Next, change the Number of Looks to be 5. You will see a new tab, Boundary , added to the input dialog box. Click the Boundary tab, and you will see the following screen. On this tab, you can choose whether to specify stopping boundaries for efficacy, or futility, or both. For this trial, choose efficacy boundaries only, and leave all other default values. We will implement the Lan-Demets (O’Brien-Fleming) spending function, with equally spaced 6.2 Group Sequential Design – 6.2.1 Study Background 81 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint looks. On the Boundary tab near the Efficacy drop-down box, click on the icons 82 6.2 Group Sequential Design – 6.2.1 Study Background or <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 , to generate the following charts. Click Compute. East will show the results in the Output Preview. The maximum combined sample size required under this design is 544. The expected 6.2 Group Sequential Design – 6.2.1 Study Background 83 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint sample sizes under H0 and H1 are 540 and 403, respectively. Click in the Output Preview toolbar to save this design to Wbk1 in the Library. Double-click on Des1 to generate the following output. Once you have finished examining the output, close this window, and re-start East before continuing. 6.2.2 Creating multiple designs easily In East, it is easy to create multiple designs by inputting multiple parameter values. In the trial described above, suppose we want to generate designs for all combinations of the following parameter values: Power = 0.8, 0.9, and Difference in Means = 8.5, 9, 9.5, 10. The number of such combinations is 2 × 4 = 8. East can create all 8 designs by a single specification in the input dialog box. Enter the following values as shown below. Remember that the common Std. Deviation is 32. From the Input Method, select the Difference of Means option. The values of Power have been entered as a list of comma-separated values, while Difference in 84 6.2 Group Sequential Design – 6.2.2 Creating multiple designs easily <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Means has been entered as a colon-separated range of values: 8.5 to 10 in steps of 0.5. Now click compute. East computes all 8 designs, and displays them in the Output Preview as shown below. Click to maximize the Output Preview. Select the first 3 rows using the Ctrl key, and click to display a summary of the design details in the upper pane, known as the Output Summary. Select Des1 in the Output Preview, and click toolbar to save this design in the Library. We will use this design for simulation and interim monitoring, as described below. Now that you have saved Des1, delete all designs from the Output Preview before continuing, by selecting all designs with the Shift key, and clicking the toolbar. 6.2.3 in Simulation Right-click Des1 in the Library, and select Simulate. Alternatively, you can select 6.2 Group Sequential Design – 6.2.3 Simulation 85 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint Des1 and click the icon. We will carry out a simulation of Des1 to check whether it preserves the specified power. Click Simulate. East will execute by default 10000 simulations with the specified inputs. Close the intermediate window after examining the results. A row labeled as Sim1 will be added in the Output Preview. Click the icon to save this simulation to the Library. A simulation sub-node will be added under Des1 node. Double clicking on the Sim1 node, will display the 86 6.2 Group Sequential Design – 6.2.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 detailed simulation output in the work area. In 80.23% of the simulated trials, the null hypothesis was rejected. This value is very close to the specified power of 80%. Note that your results may differ from the results displayed over here as the simulations would be run with different seed. The next section will explore interim monitoring with this design. 6.2 Group Sequential Design – 6.2.3 Simulation 87 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint 6.2.4 Interim Monitoring Right-click Des1 in the Library and select Interim Monitoring. Click the to open the Test Statistic Calculator. Suppose that after 91 subjects, at the first look, you have observed a mean difference of 8.5, with a standard error of 6.709. Click OK to update the IM Dashboard. The Stopping Boundaries and Error Spending Function charts on the left: 88 6.2 Group Sequential Design – 6.2.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Conditional Power and Confidence Intervals charts on the right: Suppose that after 182 subjects, at the second look, you have observed a mean difference of 16, with a standard error of 4.744. Click Recalc, and then OK to update the IM Dashboard. In this case, a boundary has been crossed, and the following 6.2 Group Sequential Design – 6.2.4 Interim Monitoring 89 <<< Contents * Index >>> 6 Tutorial: Normal Endpoint window appears. Click Stop to complete the trial. The IM Dashboard will be updated accordingly, and a table for Final Inference will be displayed as shown below. 90 6.2 Group Sequential Design <<< Contents * Index >>> 7 Normal Superiority One-Sample To compare a new process or treatment to a well-established control, a single-sample study may suffice for preliminary information prior to a full-scale investigation. This single sample may either consist of a random sample of observations from a single treatment when the mean is to be compared to a specified constant or a random sample of paired differences or ratio between two treatments. The former is presented in Section (7.1) and the latter is discussed in Section (7.2) and Section (7.3). 7.1 Single Mean 7.1.1 7.1.2 7.1.3 7.1.4 Trial Design Simulation Interim Monitoring Trial Design Using a t-Test (Single Look) The problem of comparing the mean of the distribution of observations from a single random sample to a specified constant is considered. For example, when developing a new drug for treatment of a disease, there should be evidence of efficacy. For this single-sample problem, it is desired to compare the unknown mean µ to a fixed value µ0 . The null hypothesis H0 : µ = µ0 is tested against the two-sided alternative hypothesis H1 : µ 6= µ0 or a one-sided alternative hypothesis H1 : µ < µ0 or H1 : µ > µ0 . The power of the test is computed at a specified value of µ = µ1 and standard deviation σ. Let µ̂j denote the estimate of µ based on nj observations, up to and including the j-th look, j = 1, ..., K, with a maximum of K looks. The test statistic at the j-th look is based on the value specified by the null hypothesis, namely 1/2 Zj = nj (µ̂j − µ0 )/σ̂j , (7.1) where σ̂j2 is the sample variance based on nj observations. 7.1.1 Trial Design Consider the situation where treatment for a certain infectious disorder is expected to result in a decrease in the length of hospital stay. Suppose that hospital records were reviewed and it was determined that, based on this historical data, the average hospital stay is approximately 7 days. It is hoped that the new treatment can decrease this to less than 6 days. It is assumed that the standard deviation is σ = 2.5 days.The null hypothesis H0 : µ = 7(= µ0 ) is tested against the alternative hypothesis H1 : µ < 7. First, click Continuous: One Sample on the Design tab and then click Single Arm Design: Single Mean. This will launch a new input window. Single-Look Design 7.1 Single Mean – 7.1.1 Trial Design 91 <<< Contents * Index >>> 7 Normal Superiority One-Sample We want to determine the sample size required to have power of 90% when µ = 6(= µ1 ), using a test with a one-sided type-1 error rate of 0.05. Choose Test Type as 1-Sided. Specify Mean Response under Null (µ0 ) as 7, Mean Response under Alt. (µ1 ) as 6 and Std. Deviation (σ) as 2.5. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview. The computed sample size is 54 subjects. This design has default name Des 1. Select this design by clicking anywhere along the row and click 92 in the Output Preview toolbar. Some of the design details will 7.1 Single Mean – 7.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar select Des 1, click in the Library. to save this design to Wbk1 Five-Look Design To allow the opportunity to stop early and proceed with a full-scale plan, five equally-spaced analyses are planned, using the Lan-DeMets (O’Brien-Fleming) stopping boundary. Create a new design by right-clicking Des 1 in the Library, and selecting Edit Design. In the Input, change the Number of Looks from 1 to 5, to generate a study with four interim looks and a final analysis. A new tab for Boundary Info should appear. Click this tab to reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0 ) selected, but no futility boundary (to reject H1 ). The Boundary Family specified is of the Spending Functions type. The default Spending Function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter as OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). For a detailed description of the different spending functions and stopping boundaries available in East refer to Chapter 62. The cumulative alpha spent and the boundary values are displayed below. 7.1 Single Mean – 7.1.1 Trial Design 93 <<< Contents * Index >>> 7 Normal Superiority One-Sample Click Compute. The maximum and expected sample sizes are highlighted in yellow in the Output Preview. Save this design in the current workbook by selecting the corresponding row in the Output Preview and clicking on the Output Preview toolbar. To compare Des 1 and Des 2, select both rows in Output Preview using the Ctrl key and click in the Output Preview toolbar. This will display both designs in the Output Summary pane. 94 7.1 Single Mean – 7.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Des 2 results in a maximum of 56 subjects in order to attain 90% power, with an expected sample size of 40 under the alternative hypothesis. In order to see the stopping probabilities, double-click Des 2 in the Library. The clear advantage of this sequential design resides in the relatively high cumulative probability of stopping by the third look if the alternative is true, with a sample size of 34 patients, which is well below the requirements for a fixed sample study (54 patients). Close the Output window before continuing. Examining stopping boundaries and spending functions You can plot the boundary values of Des 2 by clicking 7.1 Single Mean – 7.1.1 Trial Design on the Library toolbar, 95 <<< Contents * Index >>> 7 Normal Superiority One-Sample and then clicking Stopping Boundaries. The following chart will appear: You can choose different boundary scales from the drop down box located in the right hand side. The available boundary scales are Z scale, Score Scale, µ/σ Scale and p-value scale. To plot the error spending function for Des 2, select Des 2 in the in the toolbar, and then click Error Spending. The following Library, click 96 7.1 Single Mean – 7.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 chart will appear: The above spending function is according to Lan and DeMets (1983) with O’Brien-Fleming flavor and for one-sided tests has the following functional form: Zα/2 α(t) = 2 − 2Φ √ t Observe that very little of the total type-1 error is spent early on, but more is spent rapidly as the information fraction increases, and reaches 0.05 at an information fraction of 1. Feel free to try other plots by clicking in the Library toolbar. Close all charts before continuing. 7.1.2 Simulation Suppose we want to see the advantages of performing the interim analyses, as it relates to the chance of stopping prior to the final analysis. This examination can be conducted using simulation. Select Des 2 in the Library, and click in the toolbar. Alternatively, right-click on Des 2 and select Simulate. A new Simulation window will appear. For example, suppose you wish to determine how quickly this trial could be 7.1 Single Mean – 7.1.2 Simulation 97 <<< Contents * Index >>> 7 Normal Superiority One-Sample terminated if the treatment difference was much greater than expected. For example, under the alternative hypothesis, µ = 4.5. Click on the Response Generation Info tab, and specify: Mean Response(µ) = 4.5 and Std. Deviation (σ) = 2.5. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details will be displayed in the upper pane. Observe that 100% simulated trials rejected the null hypothesis, and about 26% of these simulations were able to reject the null at the first look after enrolling only 11 subjects. Your numbers might differ slightly due to a different starting seed. 98 7.1 Single Mean – 7.1.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 7.1.3 Interim Monitoring Suppose that the trial has commenced and Des 2 was implemented. Right-click Des 2 in the Library, and select Interim Monitoring. Although we specified that there will be five equally spaced interim looks, the Lan-DeMets methodology implemented in East allows you to alter the number and spacing of these looks. Accordingly, suppose that an interim look was taken after enrolling 20 subjects and the sample mean, based on these 20 subjects, was 5.1 with a standard error of 0.592. Since µ0 = 7, based on equation (7.1) the value of the test statistic at the first look would be Z1 = (5.1 − 7)/0.592 or -3.209. Click Enter Interim Data on the toolbar. In the Test Statistic Calculator, enter the following values, and click Recalc and thenOK. Since the stopping boundary is crossed, the following dialog box appears. 7.1 Single Mean – 7.1.3 Interim Monitoring 99 <<< Contents * Index >>> 7 Normal Superiority One-Sample Click Stop to take you back to the interim monitoring dashboard. For final inference, East will display the following summary information on the dashboard. 7.1.4 Trial Design Using a t-Test (Single Look) The sample size obtained to correctly power Des 1 in Section (7.1.1) relied on using a Wald-type statistic for the hypothesis test, given by equation (7.1). Due to the assumption of normal distribution for the test statistic, we have ignored the fact that the variance σ is estimated from the sample. For large sample sizes this approximation is acceptable. However, in small samples with unknown standard deviation the test statistic Z = n1/2 (µ̂ − µ0 )/σ̂, (7.2) is distributed with student’s t distribution with (n − 1) degrees of freedom. Here, σ̂ 2 denotes the sample variance based on n observations. Consider the example in Section 7.1.1 where we would like to test the null hypothesis that the average hospital stay is 7 days, H0 : µ = 7(= µ0 ), against the alternative hypothesis that is less than 7 days, H1 : µ < 7. We will now design the same trial in a different manner, using the t distribution for the test statistic. Right-click Des 1 in the Library, and select Edit Design. In the input window, change 100 7.1 Single Mean – 7.1.4 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Test Stat. from Z to t. The entries for the other fields need not be changed. Click Compute. East will add an additional row to the Output Preview labeled as Des 3. The required sample size is 55. Select the rows corresponding to Des 1 and Des 3 and click . This will display Des 1 and Des 3 in the Output Summary. Des 3, which uses the t distribution, requires that we commit a combined total of 55 patients to the study, just one more compared to Des 1, which uses the normal distribution. The extra patient is needed to compensate for the extra variability due to estimation of the var[δ̂]. 7.2 Mean of Paired Differences 7.2.1 7.2.2 7.2.3 7.2.4 Trial Design Simulation Interim Monitoring Trial Design Using a t-Test (Single Look) The paired t-test is used to compare the means of two normal distributions when each observation in the random sample from one distribution is matched with a unique observation from the other distribution. Let µc and µt denote the two means to be compared and let σ 2 denote the variance of the differences. The null hypothesis H0 : µc = µt is tested against the two-sided alternative hypothesis H1 : µc 6= µt or a one-sided alternative hypothesis H1 : µc < µt or H1 : µc > µt . Let δ = µt − µc . The null hypothesis can be expressed as H0 : δ = 0 and the alternative can be expressed as H1 : δ 6= 0, H1 : δ > 0, or H1 : δ < 0. The power of the test is computed at specified values of µc , µt , and σ. Let µ̂cj and µ̂tj denote the estimates of µc and µt based on nj observations, up to and including j-th look, j = 1, . . . , K where a maximum of K looks are to be made. The estimate of the difference at the j-th look is δ̂j = µ̂tj − µ̂cj 7.2 Mean of Paired Differences 101 <<< Contents * Index >>> 7 Normal Superiority One-Sample and the test statistic at the j-th look is 1/2 Zj = nj δ̂j /σˆj , (7.3) where σ̂j2 is the sample variance of nj paired differences. 7.2.1 Trial Design Consider the situation where subjects are treated once with placebo after pain is experimentally induced, and later treated with a new analgesic after pain is induced a second time. Pain is reported by the subjects using a 10 cm visual analog scale (0=“no pain”, . . . , 10=“extreme pain”). After treatment with placebo, the average is expected to be 6 cm. After treatment with the analgesic, the average is expected to be 4 cm. It is assumed that the common standard deviation is σ = 5 cm. The null hypothesis H0 : δ = 0 is tested against the alternative hypothesis H1 : δ < 0. Start East afresh. First, Continuous: One Sample on the Design tab, and then click Paired Design: Mean of Paired Differences This will launch a new input window. Single-Look Design We want to determine the sample size required to have power of 90% when µc = 6 and µt = 4, using a test with a one-sided type-1 error rate of 0.05. Select Test Type as 1-Sided, Individual Means for Input Method, and specify the Mean Control (µc ) as 6 and Mean Treatment (µt ) as 4. Enter Std. Dev. of Paired Difference (σ0 ) as 5. The upper pane should appear as below: 102 7.2 Mean of Paired Differences – 7.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview. The computed sample size is 54 subjects. This design has default name Des 1. Select this design by clicking anywhere along the row in the Output Preview and click . Some of the design details will be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar select Des 1, click in the Library. to save this design to Wbk1 Three-Look Design For the above study, suppose we wish to take up to two equally spaced interim looks and one final look as we accrue data, using the Lan-DeMets (O’Brien-Fleming) stopping boundary. Create a new design by right-clicking Des 1 in the Library, and Edit Design. In the Input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. Click Compute. The maximum and expected sample sizes are highlighted in yellow in the Output Preview. Save this design in the current workbook by selecting the on the Output Preview corresponding row in Output Preview and clicking toolbar. To compare Des 1 and Des 2, select both rows in Output Preview using the 7.2 Mean of Paired Differences – 7.2.1 Trial Design 103 <<< Contents * Index >>> 7 Normal Superiority One-Sample Ctrl key and click pane. . Both designs will be displayed in the Output Summary Des 2 results in a maximum of 55 subjects in order to attain 90% power, with an expected sample size of 43 under the alternative hypothesis. In the Output Preview toolbar select Des 2, click to save this design to Wbk1 in the Library. In order to see the stopping probabilities, double-click Des 2 in the Library. The clear advantage of this sequential design resides in the high cumulative probability of stopping by the third look if the alternative is true, with a sample size of 37 patients, which is well below the requirements for a fixed sample study (54 patients). Close the Output window before continuing. Select Des 2 and click 104 on the Library toolbar. You can select one of many 7.2 Mean of Paired Differences – 7.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 plots, including one for Stopping Boundaries: Close this chart before continuing. 7.2.2 Simulation in the toolbar. Click on the Response Select Des 2 in the Library, and click Generation Info tab, and make sure Mean Treatment(µt ) = 4, Mean Control(µc ) = 6 and Std. Deviation (σ) = 5. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click 7.2 Mean of Paired Differences – 7.2.2 Simulation . Now double-click on Sim 1 in 105 <<< Contents * Index >>> 7 Normal Superiority One-Sample the Library. The simulation output details will be displayed. Overall, close to 90% of simulations have rejected H0 . The numbers on your screen might differ slightly due to a different seed. 7.2.3 Interim Monitoring For an ongoing study we evaluate the test statistic at an interim stage to see whether we have enough evidence to reject H0 . Right-click Des 2 in the Library, and select Interim Monitoring. Although the design specified that there be three equally spaced interim looks, the Lan-DeMets methodology implemented in East allows you to alter the number and spacing of these looks. Suppose that an interim look was taken after enrolling 18 subjects and the sample mean, based on these subjects, was -2.2 with a standard error of 1.4. Then based on equation (7.3), the value of the test statistic at first look would be Z1 = (−2.2)/1.4 or -1.571. Click Enter Interim Data on the toolbar. In the Test Statistic Calculator, enter the 106 7.2 Mean of Paired Differences – 7.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following values, and click Recalc and thenOK. The dashboard will be updated accordingly. As the observed value -1.571 has not crossed the critical boundary value of -3.233, the trial continues. Now, 18 additional subjects are enrolled, and a second interim analysis with 36 subjects is conducted. Suppose that the observed difference is -2.3 with 7.2 Mean of Paired Differences – 7.2.3 Interim Monitoring 107 <<< Contents * Index >>> 7 Normal Superiority One-Sample standard error as 0.8. Select the Look 2 row and click Enter Interim Data. Enter these values, and click Recalc, and thenOK. Since the stopping boundary is crossed, the following dialog box appears. Click on Stop. For final inference, East will display the following summary information on the dashboard. 108 7.2 Mean of Paired Differences – 7.2.4 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 7.2.4 Trial Design Using a t-Test (Single Look) The sample size obtained to correctly power the trial in Section (7.2.1) relied on using a Wald-type statistic for the hypothesis test, given by equation (7.3). However, we neglected the fact that the variance σ is estimated by assuming that the test statistic follows a standard normal distribution. For large sample sizes, asymptotic theory supports this approximation. In a single-look design, this test statistic is calculated as Z = n1/2 δ̂/σ̂, (7.4) where σ̂ 2 is the sample variance based on n observed paired differences. In the following calculations we take into consideration that Z follows a Student’s t-distribution with (n − 1) degrees of freedom. Consider the example in Section 7.2.1 where we would like to test the null hypothesis that the analgesic does not reduce pain, H0 : δ = 0, against the alternative hypothesis that the new analgesic works to reduce pain, H1 : δ < 0. We will design this same trial using the t distribution for the test statistic. Right-click Des 1 from the Library, and select Edit Design. Change the Test Stat. from Z to t. The entries for the other fields need not be changed, and click Compute. East will add an additional row to the Output Preview labeled as Des 3. Select the rows corresponding to Des 1 and Des 3. This will display Des 1 and Des 3 in the 7.2 Mean of Paired Differences – 7.2.4 Trial Design Using a t-Test (Single Look) 109 <<< Contents * Index >>> 7 Normal Superiority One-Sample Output Summary. Using the t distribution, we need one extra subject to compensate for the extra variability due to estimation of the var[δ̂]. 7.3 Ratio of Paired Means The test for ratio of paired difference is used to compare the means of two log normal distributions when each observation in the random sample from one distribution is matched with a unique observation from the other distribution. Let µc and µt denote the two means to be compared and let σc2 adn σt2 are the respective variances. The null hypothesis H0 : µc /µt = 1 is tested against the two-sided alternative hypothesis H1 : µc /µt 6= 1 or a one-sided alternative hypothesis H1 : µc /µt < 1 or H1 : µc /µt > 1. Let ρ = µt /µc . Then the null hypothesis can be expressed as H0 : ρ = 1 and the alternative can be expressed as H1 : ρ 6= 1, H1 : ρ > 1, or H1 : ρ < 1. The power of the test is computed at specified values of µc , µt , and σ. We assume that σt /µt = σc /µc i.e., the coefficient of variation (CV) is the same under both control and treatment. 7.3.1 Trial Design Start East afresh. Click Continuous: One Sample on the Design tab, and then click 110 7.3 Ratio of Paired Means – 7.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Paired Design: Mean of Paired Ratios as shown below. This will launch a new window. The upper pane of this window displays several fields with default values. Select Test Type as 1-Sided, and Individual Means for Input Method. Specify the Mean Control (µc ) as 4 and Mean Treatment (µt ) as 3.5. Enter Std. Dev. of Log ratio as 0.5. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview. The computed sample size is 121 subjects (or pairs of observations). This design has default name Des 1. In the Output Preview toolbar select Des 1, click 7.3 Ratio of Paired Means – 7.3.1 Trial Design 111 <<< Contents * Index >>> 7 Normal Superiority One-Sample to save this design to Wbk1 in the Library. 7.3.2 Trial Design Using a t-test Right-click Des 1 in the Library and select Edit Design. In the input window, change the Test Stat. from Z to t. Click Compute. East will add an additional row to the Output Preview labeled as Des 2. Select the rows corresponding to Des 1 and Des 2 using the Ctrl key and click . This will display Des 1 and Des 2 in the Output Summary. Des 2 uses the t distribution and requires that we commit a combined total of 122 patients to the study, one more compared to Des 1, which uses a normal distribution. 112 7.3 Ratio of Paired Means <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample Two common applications of the paired sample design include: (1) comparison of two treatments where patients are matched on demographic and baseline characteristics, and (2) two observations made from the same patient under different experimental conditions. The type of endpoint for paired noninferiority design could be difference of means or ratio of means. The former is presented in Section 8.1 and the latter is discussed in Section 8.2. For paired sample noninferiority trials, East can be used only when no interim look is planned. 8.1 Mean of Paired Differences 8.1.1 Trial Design 8.1.2 Trial Design Using a t-Test (Single Look) 8.1.3 Simulation Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of outcome variable, X, with means µt and µc , 2 . Here, the null respectively, and with a standard deviation of paired difference as σD hypothesis H0 : µt − µc ≤ δ0 is tested against the one-sided alternative hypothesis H1 : µt − µc > δ0 . Here δ0 denotes the noninferiority margin and δ0 < 0. Let δ = µt − µc . Then the null hypothesis can be expressed as H0 : δ ≤ δ0 and the alternative can be expressed as H1 : δ > δ0 . Here we assume that the each paired observation on X from T and C are distributed according to a bivariate normal distribution with means as (µt , µc ) , variances as (σt2 , σc2 ) and correlation coefficient as ρ. Let us have N such paired observations from T and C and µ̂c and µ̂t denote the estimates of µc and µt based on these N pairs. Therefore, the estimate of the difference is δ̂ = µ̂t − µ̂c . Denoting the standard error of δ̂ by se(δ̂), the test statistic can be defined as Z= δ̂ − δ0 se(δ̂) (8.1) The test statistic Z is distributed as a t distribution with (N − 1) degrees of freedom. For large samples, the t-distribution can be approximated by the standard normal distribution. The power of the test is computed at specified values of µc , µt , and σD . East allows you to analyze using both normal and t distribution. The advantage of the paired sample noninferiority design compared to the two independent sample noninferiority design lies in the smaller se(δ̂) in former case. The paired sample design is more powerful than the two independent sample design: to achieve the same level of power, the paired sample design requires fewer subjects. 8.1.1 Trial Design Iezzi et. al. (2011) investigated the possibility of reducing radiation dose exposure 8.1 Mean of Paired Differences – 8.1.1 Trial Design 113 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample while maintaining the image quality in a prospective, single center, intra-individual study. In this study, patients underwent two consecutive multidetector computed tomography angiography (MDCTA) scans 6 months apart, one with a standard acquisition protocol (C) and another using a low dose protocol (T). Image quality was rated as an ordinal number using a rating scale ranging from 1 to 5. Let µc and µt denote the average rating of image quality for standard acquisition and low dose protocol, respectively, and δ = µt − µc be the difference between two means. Based on the 30 samples included in the study, µc and µt were estimated as 3.67 and 3.12, respectively. The noninferiority margin for image quality considered was −1. Accordingly, we will design the study to test H0 : δ ≤ −1 against H1 : δ > −1 The standard deviation of paired difference was estimated as 0.683. We want to design a study with 90% power at µc = 3.67 and µt = 3.12 and that maintains overall one-sided type I error of 0.025. First, click Continuous: One Sample on the Design tab and then click Paired Design: Mean of Paired Differences as shown below. This will launch a new window. Select Noninferiority for Design Type, and Individual Means for Input Method. Specify the Mean Control (µc ) as 3.67, Mean Treatment (µt ) as 3.12, and the Std. Dev. of Paired Difference (σD ) as 0.683. Finally, enter −1 for the Noninferiority Margin (δ0 ). Leave all other entries with their 114 8.1 Mean of Paired Differences – 8.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 default values. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (25 subjects) is highlighted. This design has default name Des 1. You can select this design by clicking anywhere along the row in the Output Preview. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper 8.1 Mean of Paired Differences – 8.1.1 Trial Design 115 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample pane, labeled as Output Summary. A total of 25 subjects must be enrolled in order to achieve the desired 90% power under the alternative hypothesis. In the Output Preview select Des 1 and click in the toolbar to save this design to Wbk1 in the Library. The noninferiority margin of −1 considered above is the minimal margin. Since the observed difference is only little less than -0.5 we would like to calculate sample size for a range of noninferiority margins, say, −0.6, −0.7, −0.8, −0.9 and −1. This can be done easily in East. First select Des 1 in the Library, and click on the Library toolbar. In the Input, change the Noninferiority Margin (δ0 ) −0.6 : −1 : −0.1. Click Compute to generate sample sizes for different noninferiority margins. This will add 5 new rows to the Output Preview. There will be a single row for each of the 116 8.1 Mean of Paired Differences – 8.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 noninferiority margins. The computed sample sizes are 1961, 218, 79, 41 and 25 with noninferiority margins −0.60, −0.7, −0.8, −0.9 and −1, respectively. To compare all 5 designs, select last 5 rows in Output Preview, and click Output Summary pane. . The 5 designs will be displayed in the Suppose we have decided to go with Des 3 to test the noninferiority hypothesis with noninferiority margin of −0.7. This requires a total sample size of 218 to achieve 90% in the toolbar to save this power. Select Des 3 in the Output Preview and click design to Wbk1 in the Library. Before we proceed we would like to delete all designs from the Output Preview. Select all rows and then either click in the toolbar, or click Delete after right click. To delete the designs from the workbook in Library select the corresponding designs individually (one at a time) and then click Delete after right click. You can try deleting Des 1 from the Library. Plotting With Des 3 selected in the Library, click on the Library toolbar, and then 8.1 Mean of Paired Differences – 8.1.1 Trial Design 117 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample click Power vs Sample Size. The resulting power curve for this design will appear. You can move the vertical bar along the X axis. To find out power at any sample size, move the vertical bar to that sample size and the numerical value of sample size and power will be displayed on the right of the plot.You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Close this chart before continuing. In a similar fashion one can see power vs delta plot by clicking and then Power vs Treatment Effect. You can obtain the tables associated with these plot by clicking clicking the appropriate table. Close the plots before continuing. 8.1.2 , and then Trial Design Using a t-Test (Single Look) The sample size obtained to correctly power Des 3 relied on using a Wald-type statistic for the hypothesis test. Due to the assumption of a normal distribution for the test statistic, we have ignored the fact that the variance σ is estimated from the sample. For large sample sizes, this approximation is acceptable. However, in small samples with unknown standard deviation, the test statistic Z = (δ̂ − δ0 )/se(σ̂) is distributed as Student’s t distribution with (n − 1) degrees of freedom where n is the 118 8.1 Mean of Paired Differences – 8.1.2 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 number of paired observations. Select Des 3 from the Library, and click . This will take you to the input window. Now change the Test Statistic from Z to t. The entries for the other fields need not be changed. Click Compute. East will add an additional row to the Output Preview. The required sample size is 220. This design uses the t distribution and it requires us to commit a combined total of 220 patients to the study, two more compared to Des 3 which uses the normal distribution. The extra couple of patients are needed to compensate for the extra variability due to estimation of the var[δ̂]. 8.1.3 Simulation Select Des 3 in the Library, and click in the toolbar. Alternatively, right-click on Des 3 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 3.67, Mean Treatment = 3.12, and Std. Deviation of Paired Difference (σD )= 0.683. Leave all default values, and click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Double-click Sim 1 in the Library, and the simulation output details will be displayed in the right pane under the 8.1 Mean of Paired Differences – 8.1.3 Simulation 119 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample Simulation tab. Notice that the percentage of rejections out of 10000 simulated trials is consistent with the design power of 90%. The exact result of the simulations may differ slightly, depending on the seed. Now we wish to simulate from a point that belongs to H0 to check whether the chosen design maintains type I error of 5%. Right-click Sim 1 in the Library and select Edit Simulation. Go to the Response Generation Info tab in the upper pane and specify: Mean control = 3.67, Mean Treatment = 2.97, and Std. Deviation of Paired Difference (σD ) = 0.683. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click 120 . Now double-click on Sim 2 in the Library. The simulation output 8.1 Mean of Paired Differences – 8.1.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details will be displayed. The upper efficacy stopping boundary was crossed close to the specified type I error of 2.5%. The exact result of the simulations may differ slightly, depending on the seed. 8.2 Ratio of Paired Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of outcome variable, X, with means µt and µc , respectively, and let σt2 and σc2 denote the respective variances. The null hypothesis H0 : µt /µc ≤ ρ0 is tested against the one-sided alternative hypothesis H1 : µt /µc > ρ0 . Here, ρ0 denotes the noninferiority margin and ρ0 < 1. Let ρ = µt /µc . Then the null hypothesis can be expressed as H0 : ρ ≤ ρ0 and the alternative can be expressed as H1 : ρ > ρ0 . Let us have N such paired observations from T and C and (Xit , Xic ) denotes the ith pair of observations (i = 1, · · · , N ). Then log Xit − log Xic = log (Xit /Xic ) denotes the logarithm of ratio of means for ith subject. We assume that the paired log-transformed observations on X from T and C, (log Xit , log Xic ) are bivariate normally distributed with common parameters. In other words, (Xit , Xic ) is distributed as bivariate log-normal distribution. Denote log Xit by yit , log Xic by yic , and the corresponding difference by δyi = yit − yic . Assume that δ̂y denotes the sample mean for these paired differences with estimated standard error se(δ̂y ). The test statistic can be defined as Z= 8.2 Ratio of Paired Means δ̂y − log ρ0 se(δ̂y ) , (8.2) 121 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample The test statistic Z is distributed as a t distribution with (N − 1) degrees of freedom. For large samples, the t-distribution can be approximated by the standard normal distribution. East allows you to analyze using both normal and t distribution. The power of the test is computed at specified values of µc , µt , and σ. 8.2.1 Trial Design We will use the same example cited in the previous section, but will transform the difference hypothesis into the ratio hypothesis. Let µc and µt denote the average rating of image quality for standard acquisition and low dose protocol, estimated as 3.67 and 3.12, respectively. Let ρ = µt /µc be the ratio between two means. Considering a noninferiority margin of −0.7 for the test of difference, we can rewrite the hypothesis mentioned in previous section as H0 : ρ ≤ 0.81 against H1 : ρ > 0.81 We are considering a noninferirority margin of 0.81(= ρ0 ). For illustration we will assume the standard deviation of log ratio as 0.20. As before, we want to design a study with 90% power at µc = 3.67 and µt = 3.12, and maintains overall one-sided type I error of 0.025. Start East afresh. Click Continuous: One Sample on the Design tab and then click Paired Design: Mean of Paired Ratios. This will launch a new window. The upper pane of this window displays several fields with default values. Select Noninferiority for Design Type, and Individual Means for Input Method. Specify the Mean Control (µc ) as 3.67, Mean Treatment (µt ) as 3.12, and Noninferiority margin (ρ0 ) as 0.81. Enter 0.20 for Std. Dev. of Log Ratio, and 0.025 for Type I Error (α). The upper pane now should appear as below: 122 8.2 Ratio of Paired Means – 8.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (180 subjects) is highlighted in yellow. This design has default name Des 1. You can select this design by clicking anywhere in the along the row in the Output Preview. Select this design and click Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. A total of 180 subjects must be enrolled in order to achieve the desired 90% power under the alternative hypothesis. In the Output Preview select Des 1 and click in the toolbar to save this design to Wbk1 in the Library. Suppose you think enrolling 180 subjects is too much for your organization and you can go up to only 130 subjects. You want to evaluate the power of your study at sample size 130 but with the design parameters remain unaltered. In order to compute power with 130 subjects, first select the Des 1 in the Library, and click on the Library toolbar. In the Input dialog box, first select the radiobutton for Power, and 8.2 Ratio of Paired Means – 8.2.1 Trial Design 123 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample then enter 130 for Sample Size. Now click Compute. This will add another row labeled as Des 2 in Output Preview with computed power highlighted in yellow. The design attains a power of 78.7%. Now select both the rows in Output Preview by pressing the Ctrl key, and click in the Output Preview toolbar to see a summary of both designs in the Output Summary. In the Output Preview select Des 2 and click to Wbk1 in the Library. in the toolbar to save this design Plotting With Des 2 selected in the Library, click on the Library toolbar, and then click Power vs Sample Size . The resulting power curve for this design will appear. 124 8.2 Ratio of Paired Means – 8.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You can move the vertical bar along the X axis. Suppose you would like to explore the relationship between power and standard deviation. In order to visualize this relationship, select Des 2 in the Library, click on the Library toolbar, and then click General (User Defined Plot). Select Std Dev 8.2 Ratio of Paired Means – 8.2.1 Trial Design 125 <<< Contents * Index >>> 8 Normal Noninferiority Paired-Sample of Log Ratio for X-Axis. This will display the power vs. standard deviation plot. Close the plot window before you continue. 8.2.2 Simulation Select Des 2 in the Library, and click in the toolbar. Alternatively, right-click on Des 2 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 3.67, Mean Treatment = 3.12, and Std Dev of Log Ratio= 0.2. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. 126 8.2 Ratio of Paired Means – 8.2.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details will be displayed. 8.2 Ratio of Paired Means 127 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Two common applications of the paired sample designs include: (1) comparison of two treatments where patients are matched on demographic and baseline characteristics, and (2) two observations made from the same patient under different experimental conditions. The type of endpoint for paired equivalence design may be a difference of means or ratio of means. The former is presented in Section 9.1 and the latter is discussed in Section 9.2. 9.1 Mean of Paired Differences 9.1.1 Trial Design 9.1.2 Simulation Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a outcome variable, X, with means µt and µc , 2 . Here, the null respectively, and with a standard deviation of paired difference as σD hypothesis H0 : µt − µc < δL or µt − µc > δU is tested against the two-sided alternative hypothesis H1 : δL ≤ µt − µc ≤ δU . Here, δL and δU denote the equivalence limits. The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis. Let δ = µt − µc denotes the true difference in the means. The null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using TOST procedure. Here, we perform the following two tests together: Test1: H0L : δ ≤ δL against H1L : δ > δL at level α Test2: H0U : δ ≥ δU against H1U : δ < δU at level α H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected. Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100% confidence interval for δ is completely contained within the interval (δL , δU ). Here we assume that the each paired observation on X from T and C are bivariate normally distributed with parameters µt , µc , σt2 , σc2 and ρ. Let us have N such paired observations from T and C, and let µ̂c and µ̂t denote the estimates of µc and µt based on these N pairs. The estimate of the difference is δ̂ = µ̂t − µ̂c . Denoting the standard error of δ̂ by se(δ̂), test statistics for Test1 and Test2 are defined as: TL = (δ̂ − δL ) se(δ̂) and TU = (δ̂ − δU ) se(δ̂) TL and TU are assumed to follow Student’s t-distribution with (N − 1) degrees of freedom under H0L and H0U , respectively. H0L is rejected if TL ≥ t1−α,(N −1) , and H0U is rejected if TU ≤ tα,(N −1) . 128 9.1 Mean of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The null hypothesis H0 is rejected in favor of H1 if TL ≥ t1−α,(N −1) and TU ≤ tα,(N −1) , or in terms of confidence intervals: Reject H0 in favor of H1 at level α if δL + t1−α,(N −1) se(δ̂) < δ̂ < δU + tα,(N −1) se(δ̂) (9.1) We see that decision rule (9.1) is the same as rejecting H0 in favor of H1 if the (1 − 2 α) 100% confidence interval for δ is entirely contained within interval (δL , δU ). The power or sample size of such a trial design is determined for a specified value of δ, say δ1 , for a single-look study only. The choice δ1 = 0, i.e. µt = µc , is common. For a specified value of δ1 , the power is given by Pr(Reject H0 ) = 1 − τν (tα,ν |Ω1 ) + τν (−tα,ν |Ω2 ) (9.2) where ν = N − 1 and Ω1 and Ω2 are non-centrality parameters given by Ω1 = (δ1 − δL )/se(δ̂) and Ω2 = (δ1 − δU )/se(δ̂), respectively. tα,ν denotes the upper α × 100% percentile from a Student’s t distribution with ν degrees of freedom. τν (x|Ω) denotes the distribution function of a non-central t distribution with ν degrees of freedom and non-centrality parameter Ω, evaluated at x. Since the sample size N is not known ahead of time, we cannot characterize the bivariate t-distribution. Thus, solving for sample size must be performed iteratively by equating the formula (9.2) to the power 1 − β. The advantage of the paired sample equivalence design compared to the two sample equivalence design lies in the smaller se(δ̂) in former case. The paired sample equivalence design is more powerful than the two sample equivalence design: to achieve the same level of power, the paired sample equivalence design requires fewer subjects. 9.1.1 Trial Design To ensure that comparable results can be achieved between two laboratories or methods, it is important to conduct cross-validation or comparability studies to establish statistical equivalence between the two laboratories or methods. Often, to establish equivalence between two laboratories, a paired sample design is employed. Feng et al. (2006) reported the data on 12 quality control (QC) samples. Each sample was analyzed first by Lab1 and then by Lab2. In this example we will consider Lab1 as the standard laboratory (C) and Lab2 is the one to be validated (T). Denote the mean concentrations from Lab1 and Lab2 by µc and µt , respectively. Considering an equivalence limit of (−10, 10) we can state our hypotheses as: H0 : µt − µc < −10 or µt − µc > 10 against H1 : − 10 ≤ µt − µc ≤ 10 9.1 Mean of Paired Differences – 9.1.1 Trial Design 129 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Based on the reported data µc and µt are estimated as 94.2 pg mL−1 and 89.9 pg mL−1 , repsectively. The standard deviation of paired difference was estimated as 8.18. We want to design a study with 90% power at µc = 94.2 and µt = 89.9. We want to reject H0 with type I error not exceeding 0.025. First, click Continuous: One Sample on the Design tab, and then click Paired Design: Mean of Paired Differences as shown below. This will launch a new window. Since we are interested in testing an equivalence hypothesis select Equivalence for Trial Type, with an Type I Error of 0.025, and Power of 0.9. Select Individual Means for Input Method. Enter −10 for Lower Equivalence Limit (δL ) and 10 for Upper Equivalence Limit (δU ). Specify the Mean Control (µc ) as 94.2, Mean Treatment (µt ) as 89.9, and Std. Dev. of Paired Difference (σD ) as 8.18. The upper pane should appear as below: 130 9.1 Mean of Paired Differences – 9.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (20 samples) is highlighted in yellow. This design has default name Des 1 and you can select this design by clicking in the anywhere along the row in the Output Preview and then clicking Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. A total of 20 samples is required to achieve the desired 90% power under the alternative hypothesis. In the Output Preview select Des 1 and click toolbar to save this design to Wbk1 in the Library. in the The equivalence limits of (−10, 10) might be too narrow and therefore a wider equivalence interval (−12.5, 12.5) could be considered. Select Des 1 in the Library, and click on the Library toolbar. In the Design Parameters tab, change the entry for Lower Equivalence Limit (δL ) and Upper Equivalence Limit (δU ) to −12.5 and 12.5, respectively, and click Compute. This will add a new row in the Output Preview labeled as Des 2. In the Output 9.1 Mean of Paired Differences – 9.1.1 Trial Design 131 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Preview select Des 2 and click in the toolbar to save this design to Wbk1 in the Library. To compare the two designs, select both rows in Output Preview using the Ctrl key and click in the Output Preview toolbar. This will display the two designs side by side in the Output Summary pane. As we widen the equivalence limit from (−10, 10) to (−12.5, 12.5), the required sample size is reduced from 20 to 11. Plotting We would like to explore how power is related to the required sample size. Select Des 2 in the Library, click on the Library toolbar, and then click Power vs 132 9.1 Mean of Paired Differences – 9.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sample Size. The resulting power curve for this design will appear. You can move the vertical bar along the X axis. To find out power at any sample size simply move the vertical bar to that sample size and the numerical value of sample size and power will be displayed on the right of the plot. You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Close this chart before continuing. In a similar fashion one can see power vs delta plot by clicking 9.1 Mean of Paired Differences – 9.1.1 Trial Design and then 133 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Power vs Treatment Effect. To produce tables associated with these plots, first click select the appropriate table. 9.1.2 in the toolbar and then Simulation Now we wish to simulate from Des 2 to verify whether the study truly maintains the in the toolbar. power and type I error. Select Des 2 in the Library, and click Alternatively, right-click on Des 2 and select Simulate. Click on the Response Generation Info tab, and specify: Mean control = 94.2, Mean Treatment = 89.9, and Std. Dev. of Paired Difference (σD ) = 8.18. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. 134 9.1 Mean of Paired Differences – 9.1.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Sim 1 in the Output Preview and click icon. Now double-click on Sim 1 in the Library. The simulation output details will be displayed. Notice that the simulated power is close to the attained power of 92.6% for Des 2. The exact result of the simulations may differ slightly, depending on the seed. Now we wish to simulate from a point that belongs to H0 to check whether the chosen design maintains type I error of 5% or not. For this we consider, µc = 94.2 and µt = 81.7. Since in this case δ = 81.7 − 94.2 = −12.5, this (µt , µc )=(81.7, 94.2) point belongs to H0 . Right-click on Sim 1 in the Library and select Edit Simulation. Go to the Response Generation Info tab in the upper pane and specify: Mean control = 94.2, Mean Treatment = 81.7, and Std. Dev. of Paired Difference (σD ) = 8.18. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click icon. Now double-click on Sim 2 in the Library. 9.1 Mean of Paired Differences – 9.1.2 Simulation 135 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample The simulation output details will be displayed in the right pane under Simulation tab. Notice that the simulated power here is close to the pre-set type I error of 5%. The exact result of the simulations may differ slightly, depending on the seed. 9.2 Ratio of Paired Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a outcome variable, X, with means µt and µc , respectively, and let σt2 and σc2 denote the respective variances. Here, the null hypothesis H0 : µt /µc ≤ ρL or µt /µc ≥ ρU is tested against the alternative hypothesis H1 : ρL < µt /µc < ρU . Let ρ = µt /µc denotes the ratio of two means. Then the null hypothesis can be expressed as H0 : ρ ≤ ρL or ρ ≥ ρU and the alternative can be expressed as H1 : ρL < ρ < ρU . In practice, ρL and ρU are often chosen such that ρL = 1/ρU . The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis, and is employed in this section for a parallel-group study. Let us have N such paired observation from T and C and (Xit , Xic ) denotes the ith pair of observations (i = 1, · · · , N ). Then log Xit − log Xic = log (Xit /Xic ) denotes the logarithm of ratio of means for the ith subject. Here we assume that the each paired log-transformed observations on X from T and C, (log Xit , log Xic ) are bivariate normally distributed with common parameters. In other words, (Xit , Xic ) is distributed as a bivariate log-normal distribution. Since we have translated the ratio hypothesis into a difference hypothesis using the log transformation, we can perform the test for difference as discussed in section 9.1. Note that we need the standard deviation of log of ratios. Sometimes, we are provided with information on coefficient of variation (CV) of ratios instead, and the standard 136 9.2 Ratio of Paired Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 deviation of log ratios can be obtained using: sd = q ln (1 + CV2 ). This is a test for the comparison of geometric means of ratio, as we are taking the mean of the logarithms of ratios. 9.2.1 Trial Design Here we will use the same example reported by Feng et al (2006). Denote the mean concentrations from Lab1 and Lab2 by µc and µt , and ρ = µt /µc is the ratio between two means. Considering an equivalence limit of (0.85, 1.15) we can state our hypotheses as H0 : µt /µc < 0.85 or µt /µc > 1.15 against H1 : 0.85 ≤ µt /µc ≤ 1.15 Based on the reported data, µc and µt are estimated as 94.2 pg mL−1 and 89.9 pg mL−1 , repsectively. Assume that the standard deviation of log ratio can be estimated is 0.086. As before, we want to design a study with 90% power at µc = 94.2 and µt = 89.9. We want to reject H0 with type I error not exceeding 0.025. Start East afresh. First, click Continuous: One Sample on the Design tab and then click Paired Design: Mean of Paired Ratios as shown below. This will launch a new window. Select Equivalence for Trial Type, and enter 0.025 for Type I Error, and 0.9 for Power. Then select Individual Means for Input Method, and enter the Mean Control (µc ) as 94.2, Mean Treatment (µt ) as 89.9, and Std. Dev. of Log Ratio as 0.086. Enter 0.85 for Lower Equiv. Limit (ρL ) and 1.15 for Upper Equiv. Limit 9.2 Ratio of Paired Means – 9.2.1 Trial Design 137 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample (ρU ). The upper pane should appear as below: Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (8 samples) is highlighted in yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. 138 9.2 Ratio of Paired Means – 9.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the Output Preview select Des 1 and click to Wbk1 in the Library. in the toolbar to save this design Plotting Suppose you want to see how the standard deviation influences the sample size. In order to visualize this relationship, select Des 1 in the Library, click on the Library toolbar, and then click General (User Defined Plot). Select Std Dev of Log Ratio for X-Axis in right of the plot. This will display the sample size vs. standard deviation plot. Close this plot before continuing. 9.2.2 Simulation Now we want to check by simulation whether the sample size of 8 provides at least 90% power. Select Des 1 in the Library, and click in the toolbar. Click on the Response Generation Info tab, and specify: Mean control = 94.2, Mean Treatment = 89.9, and Std Dev. of Log Ratio= 0.086. 9.2 Ratio of Paired Means – 9.2.2 Simulation 139 <<< Contents * Index >>> 9 Normal Equivalence Paired-Sample Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Notice that the simulated power is very close to the design power. 140 9.2 Ratio of Paired Means <<< Contents * Index >>> 10 Normal Superiority Two-Sample To demonstrate the superiority of a new treatment over the control, it is often necessary to randomize subjects to the control and treatment arms, and contrast the group-dependent means of the outcome variables. In this chapter, we show how East supports the design and interim monitoring of such experiments. 10.1 Difference of Means 10.1.1 Trial Design (Weight Control Trial of Orlistat) 10.1.2 IM of the Orlistat trial 10.1.3 t-Test Design Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a normally distributed outcome variable, X, with means µt and µc , respectively, and with a common variance σ 2 . We intend to monitor the data up to K times after accruing n1 , n2 , . . . nK ≡ nmax patients. The information fraction at the jth look is given by tj = nj /nmax . Let r denote the fraction randomized to treatment T. Define the treatment difference to be δ = µt − µc . The null hypothesis of interest is H0 : δ = 0 . We wish to construct a K-look group sequential level α test of H0 having 1 − β power at the alternative hypothesis H1 : δ = δ1 . Let X̄t (tj ) and X̄c (tj ) be the mean responses of the experimental and control groups, respectively, at time tj . Then δ̂(tj ) = X̄t (tj ) − X̄c (tj ) (10.1) σ2 . nj r(1 − r) (10.2) and var[δ̂(tj )] = Therefore, by the Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997) theorem the stochastic process W (tj ) = p X̄t (tj ) − X̄c (tj ) tj q , j = 1, 2, . . . K, 2 σ nj r(1−r) (10.3) √ is N (ηtj , tj ) with independent increments, where η = 0 under H0 and η = δ1 Imax under H1 . We refer to η as the drift parameter. 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 141 <<< Contents 10 * Index >>> Normal Superiority Two-Sample 10.1.1 Trial Design (Weight Control Trial of Orlistat) Eighteen U.S. research centers participated in this trial, where obese adults were randomized to receive either Orlistat or placebo, combined with a dietary intervention for a period of two years (Davidson et al, 1999). Orlistat is an inhibitor of fat absorption, and the trial was intended to study its effectiveness in promoting weight loss and reduce cardiovascular risk factors. The study began in October 1992. More than one outcome measure was of interest, but we shall consider only body weight changes between baseline and the end of the first year intervention. We shall consider a group sequential design even though the original study was not intended as such. The published report does not give details concerning the treatment effect of interest or the desired significance level and power of the test. It does say, however, that 75% of subjects had been randomized to the Orlistat arm, probably to maximize the number of subjects receiving the active treatment. Single-Look Design Suppose that the expected mean body weight change after one year of treatment was 9 kg in the Orlistat arm and 6 kg in the control arm. Assume also that the common standard deviation of the observations (weight change) was 8 kg. The standardized difference of interest would therefore be (9 − 6)/8 = 0.375. We shall consider a one sided test with 5% significance level and 90% power, and an allocation ratio (treatment:control) of 3:1; that is, 75% of the patients are randomized to the Treatment (Orlistat) arm. First, click Continuous: Two Samples on the Design tab, and then click Parallel Design: Difference of Means. In the upper pane of this window is the Input dialog box, which displays default input values. The effect size can be specified in one of three ways, selected from Input Method: (1) individual means and common standard deviation, (2) difference of means and common standard deviation, or (3) standardized difference of means. We will use the Individual Means method. Enter the appropriate design parameters so that the dialog box appears as shown. Remember to set the Allocation Ratio to 3. 142 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then click Compute. The design is shown as a row in the Output Preview, located in the lower pane of this window. The computed sample size is 325 subjects. You can select this design by clicking anywhere along the row in the Output Preview. On the Output Preview toolbar, click to display a summary of the design to save details in the upper pane. Then, in the Output Preview toolbar, click this design to Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. on the Library toolbar, and then With Des1 selected in the Library, click click Power vs Treatment Effect (δ). The resulting power curve for this design is shown. You can save this chart to the Library by clicking Save in Workbook. You can also export the chart in one of several image formats (e.g., Bitmap or JPEG) by 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 143 <<< Contents 10 * Index >>> Normal Superiority Two-Sample clicking Save As.... For now, you may close the chart before continuing. Three-Look Design Create a new design by selecting Des1 in the Library, and on the Library toolbar, or by right-clicking and selecting Edit clicking Design. In the Input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary Info should appear. Click this tab to reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending Function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). 144 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The cumulative alpha spent, and the boundary values, are displayed in the table below. Expected sample size and stopping probabilities Click Compute to generate output for Des2. Select both Des1 and Des2 in the Output Preview and click in yellow. . The maximum and expected sample sizes are highlighted The price to be paid for multiple looks is the commitment of a higher maximum sample size (331 patients) compared to that of a single-look design (325 patients). However, if the alternative hypothesis H1 holds, the study has a chance of stopping at one of the two interim analyses and saving patient accrual: on average, Des2 will stop 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 145 <<< Contents 10 * Index >>> Normal Superiority Two-Sample with 257 patients if the alternative is true. The expected sample size under the null is 329, less than the maximum since there is a small probability of stopping before the last look and, wrongly, rejecting the null. With Des2 selected in the Output Preview, click to save Des2 to the Library. In order to see the stopping probabilities, as well as other characteristics, double-click Des2 in the Library. The clear advantage of this sequential design resides in the high probability of stopping by the second look, if the alternative is true, with a sample size of 221 patients, which is well below the requirements for a fixed sample study (325 patients). Even under the null, however, there is a small chance for the test statistic to cross the boundary for its early rejection (type-1 error probability) at the first or second look. Close the Details window before continuing. Examining stopping boundaries and spending functions Plot the boundary values of Des2 by clicking on the Library toolbar, and then 146 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 selecting Stopping Boundaries. The following chart will appear: The three solid dots correspond to the actual boundary values to be used at the three planned analyses. Although the three looks are assumed to be equally spaced at design time, this assumption need not hold at analysis time. Values of the test-statistic (z-test) greater than the upper boundary values would warrant early stopping in favor of H1, that Orlistat is better than placebo. The horizontal axis expresses the total number of patients at each of the three analysis time-points. The study is designed so that the last analysis time point coincides with the maximum sample size required for the chosen design, namely 331 patients. By moving the vertical line cursor from left to right, one can observe the actual values of the stopping boundaries at each interim analysis time-point. The boundaries are rather conservative: for example, you would need the standardized test statistic to exceed 2.139 in order to stop the trial at the second look. It is sometimes convenient to display the stopping boundaries on the p-value scale. Under Boundary Scale, select the p-value Scale. The chart now displays the cumulative number of patients on the X-axis and the nominal p-value (1-sided) that we would need in order to stop the trial at that interim look. To change the scale of this chart, click Settings... and in the Chart Settings dialog box, change the Maximum to 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 147 <<< Contents 10 * Index >>> Normal Superiority Two-Sample 0.05, and the Divisions: Major to 0.01, and click OK. The following chart will be displayed. For example, at the second look, after 221 subjects have been observed, we require a p-value smaller than 0.016 in order to stop the study. Notice that the p-value at the 3rd and final look needs to be smaller than 0.045, rather than the usual 0.05 that one would require for a single-look study. This is the penalty we pay for the privilege of taking three looks at the data instead of one. You may like to display the boundaries in the delta scale. In this scale, the boundaries are expressed in units of the effect size, or the difference in means. We need to observe a difference in average weight loss of 2.658 148 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 kg or more, in order to cross the boundary at the second look. Close these charts, and click chart will appear. and then Error Spending. The following This spending function was proposed by Lan and DeMets (1983), and for one-sided tests has the following functional form: Zα/2 . (10.4) α(t) = 2 − 2Φ √ t Observe that very little of the total type-1 error is spent early on, but more is spent rapidly as the information fraction increases, and reaches 0.05 at an information fraction of 1. A recursive method for generating stopping boundaries from spending functions is described in the Appendix G. Close this chart before continuing. Lan and DeMets (1983) also provided a function for spending the type-1 error more aggressively. This spending function is denoted by PK, signifying that it is the Lan-DeMets spending function for generating stopping boundaries that closely resemble the classical Pocock (1977) stopping boundaries. It has the functional form: α(t) = α ln[1 + (e − 1)t] 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) (10.5) 149 <<< Contents 10 * Index >>> Normal Superiority Two-Sample Select Des2 in the Library, and click on the Library toolbar. On the Boundary Info tab, change the Parameter from OF to PK, and click Compute. With Des3 selected in the Output Preview, click and Des3, by holding the Ctrl key, and then click the details of the two designs side-by-side: . In the Library, select both Des2 . The upper pane will display In the Output Summary toolbar, click to compare the two designs according to Stopping Boundaries. Notice that the stopping boundaries for Des3 (PK) are relatively flat; almost the same critical point is used at all looks to declare significance. 150 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Close the chart before continuing. Click and select Error Spending. Des3 (PK) spends the type-1 error 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 151 <<< Contents 10 * Index >>> Normal Superiority Two-Sample probability at a much faster rate than Des2 (OF). Close the chart before continuing. Wang and Tsiatis Power Boundaries The stopping boundaries generated by the Lan-Demets OF and PK functions closely resemble closely the classical O’Brien-Fleming and Pocock stopping boundaries, respectively. These classical boundaries are a special case of a family of power boundaries proposed by Wang and Tsiatis (1987). For a two-sided level-ψ test, using K equally spaced looks, the power boundaries for the standardized test statistic Zj at the j-th look are of the form C(∆, α, K) Zj ≥ (10.6) (j/K)0.5−∆ The normalizing constant C(∆, α, K) is evaluated by recursive integration so as to ensure that the K-look group sequential test has type-1 error equal to α (see Appendix G for details), and ∆ is a parameter characterizing the shape of the stopping boundary. For example, if ∆ = 0.5, the boundaries are constant at each of the K looks. These are the classical Pocock stopping boundaries (Pocock, 1977). If ∆ = 0, the width of the boundaries is inversely proportional to the square root of the information fraction j/K at the j-th look. These are the classical O’Brien-Fleming stopping boundaries (O’Brien and Fleming, 1979). Other choices produce boundaries of different shapes. Notice from equation (10.6) that power boundaries have a specific 152 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 functional form, and can be evaluated directly, or tabulated, once the normalizing constant C(∆, α, K) has been worked out for various combinations of α and K. In contrast, spending function boundaries are evaluated indirectly by inverting a pre-specified spending function as shown in Appendix F. Right-click Des3 in the Library and select Edit Design. On the Boundary Info tab, change the Boundary Family from Spending Functions to Wang-Tsiatis. Leave the default value of ∆ as 0, and click Compute. With Des4 selected in the Output Preview, click . In the Library, select both Des2 and Des4 by holding the Ctrl key. Click and select Stopping Boundaries. As expected from our discussion above, the boundary values for Des2 (Lan-Demets, OF) and for Des4 (Wang-Tsiatis, ∆ = 0) are very similar. Close the chart before continuing. More charts Select Des3 in the Library, click , and then click Power vs. Treatment effect (δ). Click the radiobutton for Standardized under X-Axis Scale. By scrolling from left to right with the vertical line cursor, one can observe the power for various 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 153 <<< Contents 10 * Index >>> Normal Superiority Two-Sample values of the effect size. Close this chart, and with Des3 selected, click again. Then click Expected Sample Size. Click the radio button for Standardized under X-Axis Scale. The following chart appears: 154 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 By scrolling from left to right with the vertical line cursor we can observe how the expected number of events decreases as the effect size increases. Close this chart before continuing. Unequally spaced analysis time points In the above designs, we have assumed that analyses were equally spaced. This assumption can be relaxed if you know when interim analyses are likely to be performed (e.g., for administrative reasons). In either case, departures from this assumption are allowed during the actual interim monitoring of the study, but sample size requirements will be more accurate if allowance is made for this knowledge. With Des3 selected in the Library, right-click Edit Design. Under Spacing of Looks in the Boundary Info tab, click the Unequal radio button. The column titled Info. Fraction can be edited to modify the relative spacing of the analyses. The information fraction refers to the proportion of the maximum (yet unknown) sample size. By default, this table displays equal spacing, but suppose that the two interim analyses will be performed with 0.25 and 0.5 of the maximum sample size. Click Recalc to recompute the cumulative alpha spent and the efficacy boundary values. After entering these new information fraction values, click Compute. Select Des5 in the Output Preview and click to save it in the Library for now. Arbitrary amounts of error probability to be spent at each analysis Another feature of East is the possibility to specify arbitrary amounts of cumulative error probability to be used at each look. This option can be combined with the option of unequal spacing of the analyses. With Des5 selected in the Library, click on the Library toolbar. Under the Boundary Info tab, select Interpolated for the Spending Function. In the column titled Cum. α Spent, enter 0.005 for the first look 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 155 <<< Contents 10 * Index >>> Normal Superiority Two-Sample and 0.03 for the second look, click Recalc, and then Compute. Select Des6 in the Output Preview and click and Des6 by holding the Ctrl key. Click The following chart will be displayed. . From the Library, select Des5 , and select Stopping Boundaries. The advantage of Des6 over Des5 is the more conservative boundary (less type-1 error probability spent) at the first look. Close these charts before continuing. Computing power for a given sample size East can compute the achieved power, given the other design parameters such as sample size. Select Des6 in the Library and right-click Edit Design. On the Design Parameters tab, click the radio button for Power. You will notice that the field for power will contain the word “Computed”. You may now enter a value for the sample 156 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 size: Enter 250, and click Compute. As expected, the achieved power is less than 0.9, namely 0.781. To delete this design, click Des7 in the Output Preview, and click in the toolbar. East will display a warning to make sure that you want to delete the selected row. Click Yes to continue. Spending function boundaries for early stopping in favor of H0 or H1 So far we have considered only efficacy boundaries, which allow for early stopping in favor of the alternative. It may be of interest, in addition, to consider futility boundaries, which allow for early stopping when there is lack of evidence against the null hypothesis. Select Des2 in the Library and click . On the Boundary Info tab, you can select from one of several types of futility boundaries, such as from a spending function, or by conditional power. Note that some of these options are available for one-sided tests only. Select Spending Functions under Boundary Family. Select PK for the Parameter, and leave all other default settings. See the updated values of the stopping boundaries populated in the table below. On the Boundary Info tab, you may also like to click the or icons to 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 157 <<< Contents 10 * Index >>> Normal Superiority Two-Sample view plots of the error spending functions, or stopping boundaries, respectively. Click Compute, and with Des7 selected in the Output Preview, click . To view the design details, double-click Des7 in the Library. Because not all the type-2 error is spent at the final look, this trial has a chance of ending early if the null 158 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hypothesis is true. This is demonstrated by the low expected sample size under the null (209 patients), compared to those of the other designs considered so far. Close the Output window before continuing. Before continuing to the next section, we will save the current workbook, and open a new workbook. Select Wbk1 in the Library and right-click, then click Save. Next, click the button, click New, and then Workbook. A new workbook, Wbk2, should appear in the Library. Delete all designs from the Output Preview before continuing. Creating multiple designs To create more than one design from the Input, one simply enters multiple values in any of the highlighted input fields. Multiple values can be entered in two ways. First, one can enter a comma-separated list (e.g., “0.8, 0.9”). Second, one can use colon notation (e.g., “7:9:0.5”) to specify a range of values, where “a:b:c” is read as from ‘a’ to ‘b’ in step size ‘c’. Suppose that we wished to explore multiple variations of Des7. With Des7 selected in the Library, right-click and select Edit Design. In the Design Parameters tab of the Input, enter multiple values for the Power(1-β) (0.8, 0.9) and Std.Deviation(σ) (7 : 9 : 0.5) and click Compute: We have specified 10 designs here, from the combination of 2 distinct values of the power and 5 distinct values of the standard deviation. To view all 10 designs on the to maximize the Output Preview. The designs within the Output screen, click Preview can be sorted in ascending or descending order, according to one of the column variables. For example, if you click once on the column titled Sample Size, the designs will be sorted (from top to bottom) in ascending order of the total sample size. In addition, you may wish to filter and select designs that meet certain criteria. Click 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 159 <<< Contents 10 * Index >>> Normal Superiority Two-Sample on the Output Preview toolbar, and in the filter criterion box, select only those designs for which the maximum sample size is less than or equal to 400, as follows: From the remaining designs, select Des8 in the Output Preview, and click . You will be asked to nominate the workbook in which this design should be saved. Select Wbk2 and click OK. Accrual and dropout information More realistic assumptions regarding the patient accrual process – namely, accrual rate, response lag, and probability of dropout – can be incorporated into the design stage. First, the accrual of patients may be estimated to occur at some known rate. Second, because the primary outcome measure is change in body weight from baseline to end of first year, the response lag is known to be 1 year. Finally, due to the long-term nature of the study, it is estimated that a small proportion of patients is likely to drop out over the course of the study. 160 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des8 selected in the Library, click . Click Include Options in the top right hand corner of the Input, and then click Accrual/Droput Info. A new tab should appear to the right of Design Parameters and Boundary Info. Click on this Accrual/Dropout tab, and enter the following information as shown below: The accrual rate is 100 patients per year, the response lag is 1 year, and the probability that a patients drops out before completing the study is 0.1. A plot of the predicted accruals and completers over time can be generated by clicking . Click Compute to generate the design. Select Des18 in the Output Preview, and click . Select Wbk2 and click OK. Double-click Des18 in the Library. The output details reveal that in order to ensure that data can be observed for 153 completers by the second look, one needs to have accrued 255 subjects. Close this Output window 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 161 <<< Contents 10 * Index >>> Normal Superiority Two-Sample before continuing. Select individual looks With Des8 selected in Wbk2, click . In the look details table of the Boundary Info tab, notice that there are ticked checkboxes under the columns Stop for Efficacy and Stop for Futility. East gives you the flexibility to remove one of the stopping boundaries at certain looks. For example, untick the checkbox in the first look under the Stop for Futility column, and click Recalc. Click 162 to view the new boundaries. Notice that the futility boundary does not 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 begin until the second look. Simulation of the Orlistat trial Suppose you now wish to simulate Des4 in Wbk1. Select Des4 in the Library, and click the from the Library toolbar. Alternatively, right-click on Des4 and select Simulate. A new Simulation worksheet will appear. Click on the Response Generation Info tab, and input the following values: Mean control = 6; Mean Treatment = 6; (Common) Std. Deviation = 8. In other words, we are simulating from a population in which there is no true difference between the control and treatment means. This simulation will allow us to check the type-1 eror rate when using Des4. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. 10.1 Difference of Means – 10.1.1 Trial Design (Weight Control Trial of Orlistat) 163 <<< Contents 10 * Index >>> Normal Superiority Two-Sample With Sim1 selected in the Output Preview, click , then double-click Sim1 in the Library. The simulation output details will be displayed in the upper pane. In the Overall Simulation Result table, notice that the percentage of times the upper efficacy stopping boundary was crossed is largely consistent with a type-1 error of 5%. The exact values of your simulations may differ, depending on your seed. Right-click Sim1 in the Library and click Edit Simulation. In the Response Generation Info tab, enter 9 for Mean Treatment. Leave all other values, and click Simulate. With Sim2 selected in the Output Preview, click , then double-click Sim2 in the Library. Notice that the percentage of times the efficacy stopping boundary was crossed is largely consistent with 90% power for the original design. Feel free to experiment further with other simulation options before continuing. 10.1.2 Interim monitoring of the Orlistat trial Suppose we decided to adopt Des2. Select Des2 in the Library, and click on the Library toolbar. Alternatively, right-click on Des2 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the 164 10.1 Difference of Means – 10.1.2 IM of the Orlistat trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clinical trial and are useful tools for decision making by a data monitoring committee. Making Entries in the Interim Monitoring Dashboard Although the study has been designed assuming three equally spaced analyses, departures from this strategy are permissible using the spending function methodology of Lan and DeMets (1983) and its extension to boundaries for early stopping in favor of H0 proposed by Pampallona, Tsiatis and Kim (2001). At each interim analysis time point, East will determine the amount of type-1 error probability and type-2 error probability that it is permitted to spend based on the chosen spending functions specified in the design. East will then re-compute the corresponding stopping boundaries. This strategy ensures that the overall type-1 error will not exceed the nominal significance level α. We shall also see how East proceeds so as to control the type-2 error probability. Open the Test Statistic Calculator by clicking on the Enter Interim Data button. Assume that we take the first look after 110 patients (Sample Size (Overall), with an Estimate of δ as 3, and Standard Error of Estimate of δ as 1.762. Click OK to 10.1 Difference of Means – 10.1.2 IM of the Orlistat trial 165 <<< Contents 10 * Index >>> Normal Superiority Two-Sample continue. East will update the charts and tables in the dashboard accordingly. For example the Stopping Boundaries Chart displays recomputed stopping boundaries and the path traced out by the test statistic. The Error Spending Function Chart displays the cumulative error spent at each interim look. The Conditional Power (CP) Chart shows the probability of crossing the upper stopping boundary, given the most recent information. Finally, the RCI (Repeated Confidence Interval) Chart displays repeated confidence intervals (Jennison & Turnbull, 2000). Repeat the input procedure from above with the second look after 221 patients (Sample Size (Overall), Estimate of δ as 2, and Standard Error of Estimate of δ as 1. Click Recalc and OK to continue. 166 10.1 Difference of Means – 10.1.2 IM of the Orlistat trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For the final look, make sure to tick the box Set Current Look as Last. Input the following estimates: 331 patients (Sample Size (Overall), with an Estimate of δ as 3, and Standard Error of Estimate of δ as 1. Click Recalc and OK to continue. The upper boundary has been crossed. The dashboard will be updated, and the Final Inference table shows the final outputs. For example, the adjusted p-value is 0.017, consistent with the rejection of the null. 10.1.3 Trial Design Using a t-Test (Single Look) In Section 10.1.1 the sample size obtained to correctly power the trial relied on asymptotic approximation for the distribution of a Wald-type statistic. In the single look setting this statistic is δ̂ Z=q , (10.7) var[δ̂] with var[δ̂] = σ̂ 2 . nr(1 − r) (10.8) In a small single-look trial a more accurate representation of the distribution of Z is obtained by using Student’s t-distribution with (n − 1) degrees of freedom. Consider the Orlistat trial described in Section 10.1.1 where we would like to test the null hypothesis that treatment does not lead to weight loss, H0 : δ = 0, against the alternative hypothesis that the treatment does result in a loss of weight, H1 : δ > 0. We will now design this same trial in a different manner, using the t-distribution for the test statistic. Start East afresh. Click Continuous: Two Samples on the Design tab, and then click Parallel Design: Difference of Means. Enter the following design parameters so that the dialog box appears as shown. Remember to select a 1-Sided for Trial Type, and enter an Allocation Ratio of 3. These values are the same as those 10.1 Difference of Means – 10.1.3 t-Test Design 167 <<< Contents 10 * Index >>> Normal Superiority Two-Sample from Des1, except that under Dist. of Test Stat., select t. Then click Compute. We observe that the required sample size for this study is 327 patients. Contrast this to the 325 patients obtained using the normal distribution in Section 10.1.1. 168 10.1 Difference of Means – 10.1.3 t-Test Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 10.2 Ratio of Means for Independent Data (Superiority) Let σt and σc denote the standard deviations of the treatment and control group responses respectively. It is assumed that the coefficient of variation (CV), defined as σt = σc . the ratio of the standard deviation to the mean, is the same for both groups: µ µc t µt Finally let ρ = µc . For a Superiority trial, the null hypothesis H0 : ρ = ρ0 is tested against the two-sided alternative hypothesis H1 : ρ 6= ρ0 or a one-sided alternative hypothesis H1 : ρ < ρ0 or H1 : ρ > ρ0 . First, click Continuous: Two Samples on the Design tab, and then click Parallel Design: Ratio of Means. Suppose that we wish to determine the sample size required for a one sided test to achieve a type-1 error of .05, and power of 90%, to detect a ratio of means of 1.25. We also need to specify the CV = 0.25. Enter the appropriate design parameters so that the input dialog box appears as below, and click Compute. 10.2 Ratio of Means for Independent Data (Superiority) 169 <<< Contents 10 * Index >>> Normal Superiority Two-Sample The computed sample size (42 subjects) is highlighted in yellow. 170 10.2 Ratio of Means for Independent Data (Superiority) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 10.3 Difference of Means for Crossover Data (Superiority) In a crossover trial, each experimental subject receives two or more different treatments. The order in which each subject receives the treatments depends on the particular design chosen for the trial. The simplest design is a 2×2 crossover trial, where each subject receives two treatments, say A and B. Half of the subjects receive A first and then, after a suitably chosen period of time, crossover to B. The other half receive B first and then crossover to A. The null and alternative hypotheses are the same as for a two sample test for difference of means for independent data. However, a key advantage of the crossover design is that each subject serves as his/her own control. The test statistic also needs to account for not only treatment effects, but period and carryover effects. We will demonstrate this design for a Superiority trial. First, click Continuous: Two Samples on the Design tab, and then click Crossover Design: Difference of Means. Suppose that we wish to determine the sample size required to achieve a type-1 error of .05, and power of 90%, to detect a difference of means of 75 with standard deviation of the difference of 150. Enter the appropriate design parameters so that the input 10.3 Difference of Means for Crossover Data (Superiority) 171 <<< Contents 10 * Index >>> Normal Superiority Two-Sample dialog box appears as below, and click Compute. The computed sample size (45 subjects) is highlighted in yellow. 172 10.3 Difference of Means for Crossover Data (Superiority) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 10.4 Ratio of Means for Crossover Data (Superiority) We will demonstrate this design for a Superiority trial. The null hypothesis H0 : ρ = ρ0 is tested against the two-sided alternative hypothesis H1 : ρ 6= ρ0 or a one-sided alternative hypothesis H1 : ρ < ρ0 or H1 : ρ > ρ0 . First, click Continuous: Two Samples on the Design tab, and then click Crossover Design: Ratio of Means. Suppose that we wish to determine the sample size required for a one sided test to achieve a type-1 error of .05, and power of 80%, to detect a ratio of means of 1.25 with square root of MSE of 0.3. Enter the appropriate design parameters so that the input dialog box appears as below, and click Compute. The computed sample size (24 subjects) is highlighted in yellow. 10.4 Ratio of Means for Crossover Data (Superiority) 173 <<< Contents 10 10.5 * Index >>> Normal Superiority Two-Sample Assurance (Probability of Success) Assurance, or probability of success, is a Bayesian version of power, which corresponds to the (unconditional) probability that the trial will yield a statistically significant result. Specifically, it is the prior expectation of the power, averaged over a prior distribution for the unknown treatment effect (see O’Hagan et al., 2005). For a given design, East allows you to specify a prior distribution, for which the assurance or probability of success will be computed. Select Des2 in the Library, and click on the Library toolbar. Alternatively, recompute this design with the following inputs: A 3-look design with Lan-Demets(OF) efficacy only boundary, Superiority Trial, 1-sided, 0.05 type-1 error, 90% power, allocation ratio = 3, mean control = 6, mean treatment = 9, and standard deviation = 8. Select the Assurance checkbox in the Input window. Suppose that we wish to specify a Normal prior distribution for the treatment effect δ, with a mean of 3, and standard deviation of 2. Thus, rather than assuming δ = 3 with certainty, we use this prior distribution to reflect the uncertainty about the true treatment effect. In the Distribution list, click Normal, and in the Input Method list, click E(δ) and SD(δ). Type 3 in the E(δ) box, and type 2 in the SD(δ) box, and then click Compute. 174 10.5 Assurance (Probability of Success) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The computed probability of success (0.72) is shown below. Note that for this prior, assurance is less than the specified power (0.9); incorporating the uncertainty about δ has yielded a less optimistic estimate of power. In the Output Preview, right-click the row corresponding to this design, and rename the design ID as Bayes1, and save it to the Library. Return to the input window. Type 0.001 in the SD(δ) box, and click Compute. Such a prior approximates the non-Bayesian power calculation, where one specifies a fixed treatment effect. As shown below, such a prior yields a probability of success that is similar to the specified power. East also allows you to specify an arbitrary prior distribution through a CSV file. In the Distribution list, click User Specified, and then click Browse... to select the CSV file where you have constructed a prior. 10.5 Assurance (Probability of Success) 175 <<< Contents 10 * Index >>> Normal Superiority Two-Sample The CSV file should contain two columns, where the first column lists the grid points for the parameter of interest (in this case, δ), and the second column lists the prior probability assigned to each grid point. For example, we consider a 5-point prior with probability = 0.2 at each point. The prior probabilities can be entered as weights that do not sum to one, in which case East will re-normalize for you. Once the CSV filename and path has been specified, click Compute to calculate the assurance, which will be displayed in the box below: As stated in O’Hagan et al. (2005, p.190): “Assurance is a key input to decision-making during drug development and provides a reality check on other methods of trial design.” Indeed, it is not uncommon for assurance to be much lower than the specified power. The interested reader is encouraged to refer to O’Hagan et al. for further applications and discussions on this important concept. 10.6 176 Predictive Power and Bayesian Predictive Power Similar Bayesian ideas can be applied to conditional power for interim monitoring. Rather than calculating conditional power for a single assumed value of the treatment effect, δ, such as at δ̂, we may account for the uncertainty about δ by taking a weighted average of conditional powers, weighted by the posterior distribution for δ. For normal 10.6 Predictive Power and Bayesian Predictive Power <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 endpoints, East assumes a posterior distribution for δ that results from a diffuse prior distribution, which produces an average power called the predictive power (Lan, Hu, & Proschan, 2009). In addition, if the user specified a normal prior distribution at the design stage to calculate assurance, then East will also calculate the average power, called Bayesian predictive power, for the corresponding posterior. We will demonstrate these calculations for the design renamed as Bayes1 earlier. In the Library, right-click Bayes1 and click Interim Monitoring, then click (Show/Hide Columns) in the toolbar of the IM Dashboard. In the Show/Hide Columns window, make sure to show the columns for: CP (Conditional Power), Predictive Power, Bayes Predictive Power, Posterior Distribution of δ Mean, and Posterior Distribution of δ SD, and click OK. The following columns will be displayed in the main grid of the IM Dashboard. Assume that we observed interim data after 110 patients, with an estimate of δ = 1, and a standard error of the estimate = 0.7. Enter these values in the Test Statistic Calculator by clicking Enter Interim Data, and click OK. 10.6 Predictive Power and Bayesian Predictive Power 177 <<< Contents 10 * Index >>> Normal Superiority Two-Sample The IM Dashboard will be updated. In particular, notice the differing values for CP and the Bayesian measures of power. 178 10.6 Predictive Power and Bayesian Predictive Power <<< Contents * Index >>> 11 Nonparametric Superiority Two Sample The Wilcoxon-Mann-Whitney nonparametric test is a commonly used test for the comparison of two distributions when the observations cannot be assumed to come from normal distributions. It is used when the distributions differ only in a location parameter and is especially useful when the distributions are not symmetric. For Wilcoxon-Mann-Whitney test, East supports single look superiority designs only. 11.1 Wilcoxon-MannWhitney Test Let X1 , . . . , Xnt be the nt observations from the treatment (T ) with distribution function Ft and Y1 , . . . , Ync be the nc observations from the control (C) with distribution function Fc . Ft and Fc are assumed to be continuous with corresponding densities ft and fc , respectively. The primary objective in Wilcoxon-Mann-Whitney test is to investigate whether there is a shift of location, which indicates the presence of the treatment effect. Let θ represents the treatment effect. Then we test the null hypothesis H0 : θ = 0 against the two-sided alternative H1 : θ 6= 0 or a one-sided alternative hypothesis H1 : θ < 0 or H Let U denote the number of pairs P1 :ncθ > P0. nt (Xi , Yj ) such that Xi < Yj , so U = i=1 j=1 I(Xi , Yj ) where I(a, b) = 1 if a < b and I(a, b) = 0 if a ≥ b. Then U/nc nt is a consistent estimator of Z ∞ p = P (X < Y ) = Z Ft (y) fc (y) dy = −∞ 1 Ft [Fc−1 (u)] du. (11.1) 0 The power is approximated using the asymptotic normality of U and depends on the value of p, and thus depends on Fc and Ft . In order to find the power for a given sample size or to find the sample size for a given power, we must specify p. However, this is often a difficult task. If we are willing to specify Fc and Ft , then p can be computed. East computes p assuming that Fc and Ft are normal distributions with means µc and µt and a common standard deviation σ, by specifying the values of the difference in the means and the standard deviation. With this assumption, equation (11.1) results in µt − µc √ (11.2) p=Φ 2σ Using the results of Noether (1987), with nt = rN , the total sample size for an α level two-sided test to have power 1 − β for a specified value of p is approximated by N= (zα/2 + zβ )2 . 12r(1 − r)(p − .5)2 11.1 Wilcoxon-Mann-Whitney Test 179 <<< Contents 11 11.2 * Index >>> Nonparametric Superiority Two Sample Example: Designing a single look superiority study Based on a pilot study of an anti-seizure medication, we want to design a 12-month placebo-controlled study of a treatment for epilepsy in children. The primary efficacy variable is the percent change from baseline in the number of seizures in a 28-day period. The mean percent decrease was 2 for the control and 8 for the new treatment, with an estimated standard deviation of 25. We plan to design the study to test the null hypothesis H0 :θ = 0 against H1 :θ 6= 0. We want to design a study that would have 90% power at µc = 2 and µt = 8 under H1 and maintains type I error at 5%. 11.2.1 Designing the study Click Continuous: Two Samples on the Design tab and then click Parallel Design: Wilcoxon-Mann-Whitney. This will launch a new window. The upper pane of this window displays several fields with default values. Select 2-Sided for Test Type and enter 0.05 for Type I Error. Select Individual Means for Input Method and then specify Mean Control (µc ) as 2 and Mean Treatment (µt ) as 8. Specify Std. Deviation as 25. Click Compute. The upper pane now should appear as below: 180 11.2 Designing a single look study – 11.2.1 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The required sample size for this design is shown as a row in the Output Preview, located in the lower pane of this window. The computed total sample size (772 subjects) is highlighted in yellow. This design has default name Des 1 and results in a total sample size of 772 subjects in order to achieve 90% power. The probability displayed in the row is 0.567, which indicates the approximate probability P [X < Y ] assuming X ∼ N (8, 252 ) and Y ∼ N (2, 252 ). This is in accordance with the equation 11.2. Select this design by clicking anywhere along the row in the Output Preview and click in the Output Preview toolbar. Some of the design details will be 11.2 Designing a single look study – 11.2.1 Designing the study 181 <<< Contents 11 * Index >>> Nonparametric Superiority Two Sample displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click Des 1 in the Library to see the details of the design. According to this summary, the study needs a total of 772 subjects. Of these 772 subjects, 386 will be allocated to the treatment group and remaining 386 will be allocated to the control group. Since the sample size is inversely proportional to (p − .5)2 , it is sensitive to mis-specification of p (see equation (11.1)). The results of the pilot study included several subjects who worsened over the baseline and thus the difference in the means might not be an appropriate approach to determining p. To obtain a more appropriate value of p, we have several alternative approaches. We can further examine the results 182 11.2 Designing a single look study – 11.2.1 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the pilot study after exclusion of some of the extreme values, which will decrease the standard deviation and provide a difference in the means, which may be a more reasonable measure of the difference between the distributions. The difference in the medians may be a more reasonable measure of the difference between the distributions, especially when used with a decreased standard deviation. The median percent decrease was 10 for the control and 18 for the new treatment, with an estimated standard deviation of 25. Create a new design by selecting Des 1 in the Library, and clicking on the Library toolbar. In the Input, change the Mean Control (µc ) and Mean Treatment (µt ) to 10 and 18, respectively. Click Compute to generate output for Des 2. To compare Des 1 and Des 2, select both rows in Output Preview using the Ctrl key, and click icon in the Output Preview toolbar. Both designs will be displayed in the Output Summary pane. The sample size required for Des 2 is only 438 subjects as compared to 772 subjects in Des 1. Now we consider decreasing the standard deviation to 20 to lessen the impact of the extreme values. Select Des 2 in the Output Preview, and click 11.2 Designing a single look study – 11.2.1 Designing the study icon in the 183 <<< Contents 11 * Index >>> Nonparametric Superiority Two Sample toolbar. In the Input, change the Std. Deviation to 20. Click Compute to generate output for this design. Select all the rows in Output Preview and click in the Output Preview toolbar to see them in the Output Summary pane. This design results in a total sample size of 283 subjects in order to attain 90% power. 184 11.2 Designing a single look study <<< Contents * Index >>> 12 Normal Non-inferiority Two-Sample In a noninferiority trial, the goal is to establish that an experimental treatment is no worse than the standard treatment, rather than attempting to establish that it is superior. A therapy that is demonstrated to be non-inferior to the current standard therapy for a particular indication might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. Non-inferiority trials are designed by specifying a non-inferiority margin. The amount by which the mean response on the experimental arm is worse than the mean response on the control arm must fall within this margin in order for the claim of non-inferiority to be sustained. In this chapter, we show how East supports the design and interim monitoring of such experiments, with a normal endpoint. 12.1 Difference of Means 12.1.1 12.1.2 12.1.3 12.1.4 12.1.5 Trial design Three-Look Design Simulation Interim Monitoring Trial Design Using a t-Test (Single Look) 12.1.1 Trial design Consider the design of an antihypertension study comparing an ACE inhibitor to a new AII inhibitor. Let µc be the mean value of a decrease in systolic blood pressure level (in mmHg) for patients in the ACE inhibitor (control) group and µt be the mean value of a decrease in blood pressure level for patients in the AII inhibitor (treatment) group. Let δ = µt − µc be the treatment difference. We want to demonstrate that the AII inhibitor is non-inferior to the ACE inhibitor. For this example, we will consider a non-inferiority margin equal to one-third of the mean response in control group. From historical data, µc = 9 mmHg and therefore the non-inferiority margin is 3 mmHg. Accordingly we will design the study to test the null hypothesis of inferiority H0 : δ ≥ −3, against the one sided non-inferiority alternative H1 : δ < −3. The test is to be conducted at a significance level (α) of 0.025 and is required to have 90% power at δ = 0. We assume that σ 2 , the variance of the patient response, is the same for both groups and is equal to 100. Start East afresh. Click Continuous: Two Samples on the Design tab and then click Parallel Design: Difference of Means. Single-look design In the input window, select Noninferiority for Design Type. The effect size can be specified in one of three ways by selecting different options for Input Method: (1) individual means and common standard deviation, (2) difference of means and common standard deviation, or (3) standardized difference of means. We will use the Individual Means method. Select Individual Means for Input Method, specify the Mean Control (µc ) as 9 and Noninferiority margin (δ0 ) as −3 and specify the 12.1 Difference of Means – 12.1.1 Trial design 185 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Std. Deviation (σ) as 10. Specify 0 for Difference in Means (δ1 ). The upper pane should appear as below: Click Compute. This will calculate the sample size for this design, and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (467 subjects) is highlighted in yellow. This design has default name Des 1. Select this design by clicking anywhere along the row in the Output Preview and click . In the Output Preview toolbar, click to save this design to Wbk1 in the Library. If you hover the cursor over Des 1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des 1 selected in the Library, click on the Library toolbar, and then click Power vs Treatment Effect (δ). The resulting power curve for this design will 186 12.1 Difference of Means – 12.1.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 appear. You can save this chart to the Library by clicking Save in Workbook. In addition, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 12.1.2 Three-Look Design Create a new design by selecting Des 1 in the Library, and clicking on the Library toolbar. In the Input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary Info should appear. Click this tab to reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0 ) selected, but no futility boundary (to reject H1 ). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping 12.1 Difference of Means – 12.1.2 Three-Look Design 187 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample boundaries of O’Brien and Fleming (1979). Click Compute to generate output for Des 2. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking . To compare Des 1 and Des 2, select both rows in the Output Preview using the Ctrl key and click 188 . Both designs will be displayed in the Output Summary. 12.1 Difference of Means – 12.1.2 Three-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The maximum sample size with Des 2 is 473, which is only a slight increase over the fixed sample size in Des 1. However, the expected sample size with Des 2 is 379 patients under H1 , a saving of almost 100 patients. In order to see the stopping probabilities, double-click Des 2 in the Library. The clear advantage of this sequential design resides in the high probability of stopping by the second look, if the alternative is true, with a sample size of 315 patients, which is well below the requirements for a fixed sample study (467 patients). Close the Output window before continuing. Examining stopping boundaries and spending functions You can plot the boundary values of Des 2 by clicking on the Library toolbar, and then clicking Stopping Boundaries. The following chart will appear: 12.1 Difference of Means – 12.1.2 Three-Look Design 189 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample You can choose a different Boundary Scale from the corresponding drop down box. The available boundary scales include: Z scale, Score Scale, δ Scale, δ/σ Scale and p-value scale. To plot the error spending function for Des 2, select Des 2 in the in the toolbar, and then click Error Spending. The Library, click the following chart will appear: The above spending function is according to Lan and DeMets (1983) with O’Brien-Fleming flavor, and for one-sided tests has the following functional form: Zα/2 α(t) = 2 − 2Φ √ t Observe that very little of the total type-1 error is spent early on, but more is spent rapidly as the information fraction increases, and reaches 0.025 at an information fraction of 1. Feel free to explore other plots by clicking the icon in the Library toolbar. Close all charts before continuing. To obtain the tables used to generate these plots, click the icon. Select Des 2 in the Library, and click on the Library toolbar. In the Boundary Info tab, change the Boundary Family from Spending Functions to 190 12.1 Difference of Means – 12.1.2 Three-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Wang-Tsiatis. The Wang-Tsiatis (1989) power boundaries are of the form c(tj ) = C(∆, α, K)t∆ j for j = 1, 2, · · · , K, where ∆ is a shape parameter that characterizes the boundary shape and C(∆, α, K) is a positive constant. The choice ∆ = 0 will yield the classic O’Brien-Fleming stopping boundary, whereas the ∆ = 0.5 will yield the classic Pocock stopping boundary. Other choices of parameters in the range -0.5 to 0.5 are also permitted. Accept the default parameter 0 and click Compute to obtain the sample size. A new row will be added to the Output Preview with design name as Des 3. Select all three rows in Output Preview using the Ctrl key and click designs will be displayed in the Output Summary. 12.1 Difference of Means – 12.1.2 Three-Look Design . All three 191 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Note that the total sample size and the expected sample size under H1 for Des 3 are close to those for Des 2. This is expected because the Wang-Tsiatis power family with shape parameter 0 yields the classic O’Brien-Fleming stopping boundaries. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking on the Output Preview toolbar. Select Des 2 in the Library, and click the on the Library toolbar. In the Boundary Info tab, change the Spending Function from Lan-DeMets to Rho Family. The Rho spending function was first published by Kim and DeMets (1987) and was generalized by Jennison and Turnbull (2000). It has following functional form: α(t) = αtρ ρ>0 When ρ = 1, the corresponding stopping boundaries resemble the Pocock stopping boundaries. When ρ = 3, the boundaries resemble the O’Brien-Fleming boundaries. Larger value of ρ yield increasingly conservative boundaries. Specify parameter (ρ) as 2, and click Compute A new row will be added to the Output Preview with design name as Des 4. Select all four rows in Output Preview using the Ctrl key and click 192 12.1 Difference of Means – 12.1.2 Three-Look Design . All the designs will <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 be displayed in the Output Summary. Observe that Des 4 requires a total sample size of 14 more subjects than Des 2. The expected sample size under H1 for Des 4 is only 351 patients, compared to 379 patients for Des 2 and 467 patients for Des 1. Save Des 4 to the Library by selecting the corresponding row in the Output Preview and clicking 12.1.3 . Simulation Select Des 4 in the Library, and click in the toolbar. Alternatively, right-click on Des 4 and select Simulate. A new window for simulation will appear. Click on the Response Generation Info tab, and specify: Mean control = 9; Mean Treatment = 9; SD Control = 10. Click Simulate. Once the simulation run has completed, East will add an additional 12.1 Difference of Means – 12.1.3 Simulation 193 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Double-click Sim 1 in the Library. The simulation output details will be displayed. The upper efficacy stopping boundary was crossed around 90% of times, out of 10,000 simulated trials, which is consistent with the power of 90%. The exact result of the simulations may differ slightly, depending on the seed. 12.1.4 Interim Monitoring Select Des 4 in the Library, and click 194 from the Library toolbar. Alternatively, 12.1 Difference of Means – 12.1.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 right-click on Des 4 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Although the study has been designed assuming three equally spaced analyses, departures from this strategy are permissible using the spending function methodology of Lan and DeMets (1983) and its extension to boundaries for early stopping in favor of H0 proposed by Pampallona, Tsiatis and Kim (2001). At each interim analysis time point, East will determine the amount of type-1 error probability and type-2 error probability that it is permitted to spend based on the chosen spending functions specified in the design. East will then re-compute the corresponding stopping boundaries. This strategy ensures that the overall type-1 error does not exceed the nominal significance level α. 12.1 Difference of Means – 12.1.4 Interim Monitoring 195 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Let us take the first look after accruing 200 subjects. The test statistic at look j for testing non-inferiority is given by Zj = δ̂j − δ0 SE(δ̂j ) where δ̂j and δ0 indicate estimated treatment difference and the non-inferiority margin, respectively. SE denotes the standard error. Suppose we have observed δ̂j = 2.3033 and SE(δ̂j ) = 2.12132. With δ0 = −3, the value of test statistic at first look would be Z1 = (2.3033 + 3)/2.12132 or 2.5. To pass these values to East, click Enter Interim Data to open the Test Statistic Calculator. Enter the following values: 200 for Cumulative Sample Size, 2.3033 as Estimate of δ and 2.12132 as Standard Error of Estimate of δ. Click Recalc, and thenOK. The value of test statistic is 2.498, which is very close to the stopping boundary 2.634. The lower bound of 97.5% repeated confidence interval (RCI) for δ is -3.29. 196 12.1 Difference of Means – 12.1.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the dashboard. icon in the Conditional Power chart located in lower part of the The conditional power at the current effect size 2.303 is over 99.3%. Suppose we take the next interim look after accruing 350 subjects. Enter 350 for Cumulative Sample Size, 2.3033 for Estimate of δ and 1.71047 for Standard Error of Estimate of δ. Click Recalc and OK to update the charts and tables in the dashboard. 12.1 Difference of Means – 12.1.4 Interim Monitoring 197 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Now the stopping boundary is crossed and the following dialog box appears. Click Stop. The dashboard will now include the following table. The adjusted confidence interval and p-value are calculated according to the approach proposed by Tsiatis, Rosner and Mehta (1984) and later extension by Kim and DeMets (1987). The basic idea here is to search for the confidence bounds such that the p-value under the alternative hypothesis just becomes statistically significant. 12.1.5 Trial Design Using a t-Test (Single Look) In Section 12.1 the sample size is obtained based on asymptotic approximation of the distribution of the test statistics δ̂ − δ q 0 var[δ̂] If the study under consideration is small, the above asymptotic approximation of the distribution may be poor. Using the student’s t-distribution with (n − 1) degrees of freedom, we may better size the trial to have appropriate power to reject the H0 . In East, this can be done through specifying distribution of test statistic as t. We shall illustrate this by designing the study described in Section 12.1 that aims to demonstrate that the AII inhibitor is non-inferior to the ACE inhibitor. 198 12.1 Difference of Means – 12.1.5 Trial Design Using a t-Test (Single Look) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Des 1 from the Library. Click from the toolbar. Change the Test Statistic from Z to t. The entries for the other fields need not be changed. Click Compute. East will add an additional row to the Output Preview labeled as Des 5. The required sample size is 469. Select the rows corresponding to Des 1 and Des 5 and . This will display both designs in the Output Summary. Des 5, which used the t distribution, requires us to commit a combined total of 469 patients to the study, up from 467 in Des 1, which used the normal distribution. The extra patients are needed to compensate for the extra variability due to estimation of the var[δ̂]. 12.2 Ratio of Means 12.2.1 Trial design 12.2.2 Designing the study 12.2.3 Simulation Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively, and let σt2 and σc2 denote the corresponding variances of the observations. It is assumed that σt /µt = σc /µc , i.e. the coefficient of variation CV = σ/µ is the same for t and c. Finally, let ρ = µt /µc . For a non-inferiority trial with ratio of means we define the null hypothesis as H0 : ρ ≤ ρ0 if ρ0 < 1 H0 : ρ ≥ ρ0 if ρ0 > 1 where ρ0 denotes the noninferiority margin. Consider the case when ρ0 < 1. Now define δ = ln(ρ) = ln(µt ) − ln(µc ), so the null hypothesis becomes H0 : δ ≤ δ0 where δ0 = ln(ρ0 ). Since we can translate the ratio hypothesis into a difference hypothesis, we can perform the test for difference as discussed in section 12.1 on log-transformed data. 12.2 Ratio of Means 199 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Here, we need the standard deviation of log transformed data. If we are provided with the coefficient of variation (CV) instead,qthe standard deviation of log transformed data can be obtained using the relation sd = 12.2.1 ln (1 + CV2 ). Trial design For illustration, we consider the example cited by Laster and Johnson (2003): A randomized clinical study of a new anti-hypertensive therapy known to produce fewer side-effects than a standard therapy but expected to be almost 95% effective (ρ1 = 0.95). To accept the new therapy, clinicians want a high degree of assurance that it is at least 80% as effective in lowering blood pressure as the standard agent. Accordingly we plan to design the study to test: H0 : µt /µc ≤ 0.8 against H1 : µt /µc > 0.8 Reductions in seated diastolic blood pressure are expected to average 10 mmHg (= µc ) with standard therapy with standard deviation as 7.5 mmHg (= σc ). Therefore, CV in the standard therapy is 7.5/10 = 0.75. We also assume that CV in both therapies are equal. We need to design a study that would have 90% power at ρ1 = 0.95 under H1 and maintains one-sided type I error at 5%. 12.2.2 Designing the study Start East afresh. Click Continuous: Two Samples, under the Design tab, and then click Parallel Design: Ratio of Means. In the input window, select Noninferiority for Design Type. Select Individual Means for Input Method and then specify the Mean Control (µc ) as 10, Noninferiority Margin (ρ0 ) as 0.8 and Ratio of Means (ρ1 ) as 0.95. Specify 0.75 200 12.2 Ratio of Means – 12.2.2 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 value for Coeff. Var.. The upper pane should appear as below: Click Compute. This will calculate the sample size for this design, and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed total sample size (636 subjects) is highlighted in yellow. This design has default name Des 1. Select this design by clicking anywhere along the row in the Output Preview and click . Some of the design details will be 12.2 Ratio of Means – 12.2.2 Designing the study 201 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample displayed in the Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click on Des 1 in the Library to see the details of the design. 202 12.2 Ratio of Means – 12.2.2 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Unequal allocation ratio Since the profile of standard therapy is well established and comparatively little is known about the new therapy, you want to put more subjects on the new therapy. You can do this by specifying allocation ratio greater than 1. Suppose you want 50% more subjects on new therapy compared to standard one. Then we need to specify allocation ratio (nt /nc ) as 1.5. Create a new design by selecting Des 1 in the Output Preview, and clicking on the Output toolbar. In the Input, change the Allocation Ratio from 1 to 1.5. Click Compute to obtain sample size for this design. A new row will be added labeled as Des 2. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking on the Output Preview toolbar. Select both rows in Output Preview using the Ctrl key and click . t distribution test statistic Create a new design by selecting Des 2 in the Output, and clicking on the Output toolbar. In the Input, change the Test Statistic from Z to t. Click Compute to obtain sample size for this design. A new row will be added labeled as Des 3. 12.2 Ratio of Means – 12.2.2 Designing the study 203 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample A sample size of 664 will be needed, which is very close to the sample size 662 obtained in Des 2 under the normal distribution. Plotting With Des 2 selected in the Library, click on the Library toolbar, and then click Power vs Sample Size . The resulting power curve for this design will appear. You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Feel free to explore other plots as well. Once you have finished, close all charts before continuing. 204 12.2 Ratio of Means – 12.2.2 Designing the study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 12.2.3 Simulation Select Des 2 in the Library, and click in the toolbar. Alternatively, right-click on Des 2 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 10; Mean Treatment = 9.5; CV of Data Control = 0.75. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Double-click on Sim 1 in the Library. The simulation output details will be displayed. Out of 10,000 simulations, close to 90% are rejected for non-inferiority. Therefore, the simulation result verifies that the design attains 90% power. The simulation result might vary depending on the starting seed value chosen. 12.2 Ratio of Means – 12.2.3 Simulation 205 <<< Contents 12 12.3 * Index >>> Normal Non-inferiority Two-Sample Difference of Means in Crossover Designs 12.3.1 Trial Design In a 2 × 2 crossover design each subject is randomized to one of two sequence groups. Subjects in sequence group 1 receive the test drug (T) formulation in a first period, have their outcome variable, X recorded, wait out a washout period to ensure that the drug is cleared from their system, then receive the control drug formulation (C) in period 2 and finally have the measurement on X again. In sequence group 2, the order in which the T and C are assigned is reversed. The table below summarizes this type of trial design. Group 1(TC) 2(CT) Period 1 Test Control Washout — — Period 2 Control Test The resulting data are commonly analyzed using a linear model. The response yijk in period j on subject k in sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni is modeled as a linear function of an overall mean response µ, formulation effect τt and τc , period effects π1 and π2 , and sequence effects γ1 and γ2 . The fixed effects model can be displayed as: Group 1(TC) 2(CT) Period 1 µ + τt + π1 + γ1 µ + τc + π1 + γ2 Washout — — Period 2 µ + τc + π2 + γ1 µ + τt + π2 + γ2 Let µt = µ + τt and µc = µ + τc denote the means of the observations from the test and control formulations, respectively, and let M SE denote the mean-squared error. In a noninferiority trial, we test H0 : δ ≤ δ0 against H0 : δ > δ0 if δ0 < 0 or H0 : δ ≥ δ0 against H0 : δ < δ0 if δ0 > 0, where δ0 indicates the noninferiority margin. East uses following test statistic to test the above null hypothesis TL = (ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δ0 q σ̂ 2 1 1 2 ( n1 + n2 ) where, ȳij is the mean of the observations from group i and period j and σ̂ 2 is the estimate of error variance. Tτ is distributed with Student’s t distribution with (n1 + n2 − 2) degrees of freedom. 206 12.3 Difference of Means in Crossover Designs – 12.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 12.3.1 Trial Design Consider a 2 × 2 crossover trial between a Test drug (T) and a Reference Drug (C) where the noninferiority need to be established in terms of some selected treatment response. Let µT and µc denote the mean of Test and Reference drugs, respectively. Let δ = µt − µc be the difference in averages. The noninferiority margin were set at -3. Accordingly we plan to design the study to test: H0 : µt − µc ≤ −3 against H1 : µt − µc > −3 For this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 . Further we assume mean squared error (MSE) would be 2.5. We want to design a study that would have 90% power at δ1 = 23.19 − 21.62 = 1.57 under H1 . We want to perform this test at a one sided 0.025 level of significance. Start East afresh. First, Continuous: Two Samples on the Design tab, and then click Crossover Designs: Difference of Means. In the input window, select Noninferiority for Design Type. Select Individual Means for Input Method and then specify the Mean Control (µc ) as 21.62 and Mean Treatment (µt ) as 23.19. Enter the Type I Error (α) as 0.025. Select Sqrt(MSE) from the drop-down list and enter as 2.5. Finally, enter Noninferiority Margin (δ0 ) as −3. The upper pane should appear as below: Click Compute. The sample size required for this design is highlighted in yellow. Save this design in the current workbook by selecting the corresponding row in 12.3 Difference of Means in Crossover Designs – 12.3.1 Trial Design 207 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample Output Preview and clicking on the Output Preview toolbar. Double-lick Des 1 in Library. This will display the design details. The sample size required for Des 1 is only 9 to establish non-inferiority with 90% power. 12.4 Ratio of Means in Crossover Designs 12.4.1 Trial Design We consider the same anti-hypertensive therapy example discussed in section 12.2, but this time we will assume that the data has come from a crossover design. We wish to test the following hypotheses: H0 : µt /µc ≤ 0.8 against H1 : µt /µc > 0.8 We want the study to have at least 90% power at ρ1 = 0.95 and maintains one-sided type I error at 5%. As before, we will consider CV = 0.75 for both treatment arms. Start East afresh. First, click Continuous: Two Samples under the Design tab, and then click Crossover Design: Ratio of Means. In the input window, select Noninferiority for Design Type. Select Individual Means for Input Method and then specify the Noninferiority 208 12.4 Ratio of Means in Crossover Designs – 12.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Margin (ρ0 ) as 0.8, Mean Control (µc ) as 10, and Mean Treatment (µt ) as 9.5. Using the relationship between CV (=0.75) and standard deviation of log-transformed data mentioned in section 12.2, we have standard deviaton for log-transformed data as 0.45. Specify 0.45 for Sqrt. of MSE Log. The upper pane should appear as below: Click Compute. The sample size required for this design is highlighted in yellow in the Output Preview pane. Save this design in the current workbook by selecting the corresponding row in Output Preview and clicking toolbar. Select Des 1 in Library and click on the Output Preview . This will display the design details. 12.4 Ratio of Means in Crossover Designs – 12.4.1 Trial Design 209 <<< Contents 12 * Index >>> Normal Non-inferiority Two-Sample In general, a crossover design requires fewer subjects compared to its parallel design counterpart, and may be preferred whenever it is feasible. 210 12.4 Ratio of Means in Crossover Designs <<< Contents * Index >>> 13 Normal Equivalence Two-Sample In many cases, the goal of a clinical trial is neither superiority nor non-inferiority, but equivalence. In Section 13.1, the problem of establishing the equivalence with respect to the difference of the means of two normal distributions using a parallel-group design is presented. The corresponding problem of establishing equivalence with respect to the log ratio of means is presented in Section 13.2. For the crossover design, the problem of establishing equivalence with respect to the difference of the means is presented in Section 13.3, and with respect to the log ratio of means in Section 13.4. 13.1 Difference in Means 13.1.1 Trial design 13.1.2 Simulation In some experimental situations, we want to show that the means of two normal distributions are “close”. For example, a test formulation of a drug (T) and the control (or reference) formulation of the same drug (C) are considered to be bioequivalent if the rate and extent of absorption are similar. Let µt and µc denote the means of the observations from the test and reference formulations, respectively, and let σ 2 denote the common variance of the observations. The goal is to establish that δL < µt − µc < δU , where δL and δU are a-priori specified values used to define equivalence. The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis, and is employed in this section for a parallel-group study. Let δ = µt − µc denote the true difference in the means. The null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using TOST. Here we perform the following two tests together: Test1: H0L : δ ≤ δL against H1L : δ > δL at level α Test2: H0U : δ ≥ δU against H1U : δ < δU at level α H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected. Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100% confidence interval for δ is completely contained within the interval (δL , δU ). Let N be the total sample size and µ̂t and µ̂c denote the estimates of the means T and C, respectively. Let δ̂ = µ̂t − µ̂c denote the estimated difference with standard error se(δ̂) We use the following two test statistics to apply Test1 and Test2, respectively: TL = 13.1 Difference in Means (δ̂ − δL ) se(δ̂) , TU = (δ̂ − δU ) se(δ̂) 211 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample TL and TU are assumed to follow Student’s t-distribution with (N − 2) degrees of freedom under H0L and H0U , respectively. H0L is rejected if TL ≥ t1−α,(N −2) , and H0U is rejected if TU ≤ tα,(N −2) . The null hypothesis H0 is rejected in favor of H1 if TL ≥ t1−α,(N −2) and TU ≤ tα,(N −2) , or in terms of confidence intervals: Reject H0 in favor of H1 at level α if √ √ (13.1) δL + t1−α,(N −2) 2σ̂/ N < δ̂ < δU + tα,(N −2) 2σ̂/ N . We see that decision rule (13.1) is the same as rejecting H0 in favor of H1 if the (1 − 2 α) 100% confidence interval for δ is entirely contained within interval (δL , δU ). √ The above inequality (13.1) cannot hold if 4t1−α,(N −2) σ̂/ N ≥ (δU − δL ), in which √ case H0 is not rejected in favor of H1 . Thus, we assume that 4t1−α,(N −2) σ̂/ N < (δU − δL ). The impact of this assumption was examined by Bristol (1993a). The power or sample size of such a trial design is determined for a specified value of δ, denoted δ1 , for a single-look study only. The choice δ1 = 0, i.e. µt = µc , is common. For a specified value of δ1 , the power is given by Pr(Reject H0 ) = 1 − τν (tα,ν |∆1 ) + τν (−tα,ν |∆2 ) (13.2) where ν = N − 2 and ∆1 and ∆2 are non-centrality parameters given by ∆1 = (δ1 − δL )/se(δ̂) and ∆2 = (δ1 − δU )/se(δ̂), respectively. tα,ν denotes the upper α × 100% percentile from a Student’s t distribution with ν degrees of freedom. τν (x|∆) denotes the distribution function of a non-central t distribution with ν degrees of freedom and non-centrality parameter ∆, evaluated at x. Since we don’t know the sample size N ahead of time, we cannot characterize the bivariate t-distribution. Thus solving for sample size must be performed iteratively by equating the formula (13.2) to the power 1 − β. 13.1.1 Trial design Consider the situation where we need to establish equivalence between a test formulation of capsules (T) with the marketed capsules (C). The response variable is the change from baseline in total symptom score. Based on the studies conducted during the development program, it is assumed that µc = 6.5. Based on this value, equivalence limits were set as −δL = δU = 1.3(= 20%µc ). We assume that the common standard deviation is σ = 2.2. We want to have 90% power at µt = µc . 212 13.1 Difference in Means – 13.1.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Start East afresh. Click Continuous: Two Samples on the Design tab and then click Parallel Design: Difference of Means. This will launch a new window. The upper pane of this window displays several fields with default values. Select Equivalence for Design Type, and Individual Means for Input Method. Enter 0.05 for Type I Error. Specify both Mean Control (µc ) and Mean Treatment (µt ) as 6.5. We have assumed σ = 2.2. Enter this value for Std. Deviation(σ). Also enter −1.3 for Lower Equivalence Limit (δL ) and 1.3 for Upper Equivalence Limit (δU ). The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (126 subjects) is highlighted in yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, 13.1 Difference in Means – 13.1.1 Trial design 213 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample labeled as Output Summary. A total of 126 subjects must be enrolled in order to achieve the desired 90% power under the alternative hypothesis. Of these 126 subjects 63 will be randomized to the test formulation, and the remaining 63 to the marketed formulation. In the Output Preview toolbar, select Des 1 and click Library. to save this design to Wbk1 in the Suppose that this sample size is not economically feasible and we want to examine power for a total sample size of 100. Select Des 1 in the Library, and click on the Library toolbar. In the Input, click the radiobutton for Power, and enter Sample Size (n) as 100. 214 13.1 Difference in Means – 13.1.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. This will add a new row to the Output Preview and the calculated power is highlighted in yellow. We see that a power of 80.3% can be achieved with 100 subjects. Suppose we want to see how the design parameters such as power, sample size and treatment effect are interrelated. To visualize any particular relationship for Des 1, first select Des 1 from Library and then click in the toolbar. You will see a list of options available. To plot power against sample size, click Power vs Sample Size. Feel free to explore other plots and options available with them. Close the charts before continuing. 13.1.2 Simulation We wish to make sure that Design 1 has the desired power of 90%, and maintains the type I error of 5%. This examination can be conducted using simulation. Select Des 1 in the Library, and click in the toolbar. Alternatively, right-click Des 1 and 13.1 Difference in Means – 13.1.2 Simulation 215 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab. We will first simulate under H1 . Leave the default values as below, and click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details, including the table below, will be displayed. Observe that out of the 10,000 simulated trials, the null hypothesis was around 90% of the time. (Note: The numbers on your screen might differ slightly because you might be using a different starting seed for your simulations.) Next we will simulate from a point that belongs to the null hypothesis. Consider µc = 6.5 and µt = 7.8. Select Sim 1 in Library and click icon. Go to the Response Generation Info tab in the upper pane and specify: Mean Control (µc ) = 6.5 and Mean Treatment (µt ) = 7.8. 216 13.1 Difference in Means – 13.1.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click . Now double-click on Sim 2 in the Library. You can see that when H0 is true, the simulated power is close to the specified type I error of 5%. 13.2 Ratio of Means 13.2.1 Trial design 13.2.2 Simulation For some pharmacokinetic parameters, the ratio of the means is a more appropriate measure of the distance between the treatments. Let µt and µc denote the means of the observations from the test formulation (T) and the reference (C), respectively, and let σt2 and σc2 denote the corresponding variances of the observations. It is assumed that σt /µt = σc /µc , i.e. the coefficient of variation CV = σ/µ is the same for T and C. Finally, let ρ = µt /µc . The goal is to establish that ρL < ρ < ρU , where ρL and ρU are specified values used to define equivalence. In practice, ρL and ρU are often chosen such that ρL = 1/ρU . The two one-sided tests procedure of Schuirmann (1987) is commonly used for this analysis, and is employed here for a parallel-group study. The null hypothesis H0 : ρ ≤ ρL or ρ ≥ ρU is tested against the two-sided alternative hypothesis H1 : ρL < ρ < ρU at level α, using two one-sided tests. Schuirmann (1987) proposed working this problem on the natural logarithm scale. Thus, we are interested in the parameter δ = ln(ρ) = ln(µt ) − ln(µc ) and the null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using two one-sided t-tests. Here δL = ln(ρL ) and δU = ln(ρU ). Since we have translated the ratio hypothesis into a difference hypothesis, we can perform the test for difference as discussed in section 13.1. Note that we need the standard deviation for log transformed data. However, if we are provided with information on CV instead, the standard deviation of log transformed data can be q obtained using the relation sd = 13.2.1 ln (1 + CV2 ). Trial design Suppose that the logarithm of area under the curve (AUC), a pharmacokinetic parameter related to the efficacy of a drug, is to be analyzed to compare the two formulations of a drug. We want to show that the two formulations are bioequivalent by showing that the ratio of the means satisfies 0.8 < µt /µc < 1.25. Thus ρL = 0.8 and ρU = 1.25. Also, based on previous studies, it is assumed that the coefficient of variation is CV = 0.25. 13.2 Ratio of Means – 13.2.1 Trial design 217 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample Start East afresh. Click Continuous: Two Samples on the Design tab and then click Parallel Design: Ratio of Means. This will launch a new window. The upper pane of this window displays several fields with default values. Select Equivalence for Trial Type, and enter 0.05 for the Type I Error. For the Input Method, specify Ratio of Means. Enter 1 for Ratio of Means (ρ1 ), 0.8 for Lower Equivalence Limit (ρL ) and 1.25 for Upper Equivalence Limit (ρU ). Specify 0.25 for Coeff. Var.. The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed total sample size (55 subjects) is highlighted in yellow. In the Output Preview toolbar, click to save this design to Wbk1 in the Library . Double-click Des 1 in the Library to see the details of the designs. Close 218 13.2 Ratio of Means – 13.2.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 this output window before continuing. Plotting With Des 1 selected in the Library, click on the Library toolbar, and then click Power vs Sample Size. The resulting power curve for this design will appear. 13.2 Ratio of Means – 13.2.1 Trial design 219 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample You can export this chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... Feel free to explore charts. Close all chart before continuing. 13.2.2 Simulation Suppose you suspect that CV will be smaller than 0.25; e.g., 0.2. Select Des 1 in the Library, and click in the toolbar. Click on the Response Generation Info tab and change C.V. of Data Control to 0.20. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in the Library. The simulation output details will be displayed in the upper pane. Observe that out of 10,000 simulated trials, the null hypothesis was rejected over 98% of the time. (Note: The numbers on your screen might differ slightly depending on the starting seed.) 220 13.2 Ratio of Means – 13.2.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 13.3 Difference of Means in Crossover Designs 13.3.1 Trial design 13.3.2 Simulation Crossover trials are widely used in clinical and medical research. The crossover design is often preferred over a parallel design, because in the former, each subject receives all the treatments and thus each subject acts as their own control. This leads to the requirement of fewer subjects in a crossover design. In this chapter, we show how East supports the design and simulation of such experiments with endpoint as difference of means. In a 2 × 2 crossover design each subject is randomized to one of two sequence groups (or, treatment sequences). Subjects in sequence group 1 receive the test drug (T) formulation in a first period, have their outcome variable, X recorded, wait out a washout period to ensure that the drug is cleared from their system, then receive the control drug formulation (C) in period 2 and finally have the measurement on X again. In sequence group 2, the order in which the T and C are assigned is reversed. The table below summarizes this type of trial design. Group 1(TC) 2(CT) Period 1 Test Control Washout — — Period 2 Control Test The resulting data are commonly analyzed using a linear model. The response yijk on the kth subject in period j of sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni is modeled as a linear function of an overall mean response µ, formulation effect τt and τc , period effects π1 and π2 , and sequence effects γ1 and γ2 . The fixed effects model can be displayed as: Group 1(TC) 2(CT) Period 1 µ + τt + π1 + γ1 µ + τc + π1 + γ2 Washout — — Period 2 µ + τc + π2 + γ1 µ + τt + π2 + γ2 Let µt = µ + τt and µc = µ + τc denote the means of the observations from the test and control formulations, respectively, and let M SE denote the mean-squared error of the log data obtained from fitting the model. This is nothing other than the M SE from a crossover ANOVA model for the 2 × 2 design (2 periods and 2 sequences). 13.3 Difference of Means in Crossover Designs 221 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample In an equivalence trial, the goal is to establish δL < µt − µc < δU , where δL and δU are specified values used to define equivalence. In practice, δL and δU are often chosen such that δL = −δU The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis, and is employed here for a crossover study. Let δ = µt − µc denotes the true difference in the means. The null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using TOST. Here we perform the following two tests together: Test1: H0L : δ ≤ δL against H1L : δ > δL at level α Test2: H0U : δ ≥ δU against H1U : δ < δU at level α H0 is rejected in favor of H1 at level α if and only if both H0L and H0U are rejected. Note that this is the same as rejecting H0 in favor of H1 at level α if the (1 − 2α) 100% confidence interval for δ is completely contained within the interval (δL , δU ). East uses following test statistic to test the above two null hypotheses TL = (ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δL q M SE 1 1 2 ( n1 + n2 ) TU = (ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δU q M SE 1 1 2 ( n1 + n2 ) and where, ȳij is the mean of the observations from group i and period j. Both TL and TU are distributed as Student’s t distribution with (n1 + n2 − 2) degrees of freedom. The power of the test (i.e. probability of declaring equivalence) depends on the true value of µt − µc . The sample size (or power) is determined at a specified value of this difference, denoted δ1 . The choice δ1 = 0, i.e. µt = µc , is √ common. Note that the power and the sample size depend only on δL , δU , δ1 , and M SE. 13.3.1 Trial design Standard 2 × 2 crossover designs are often recommended in regulatory guidelines to establish bioequivalence of a generic drug with off patent brand-name drug. Consider a 2 × 2 bioequivalence trial between a Test drug (T) and a Reference Drug (C) where equivalence needs to be established in terms of the pharmacokinetic parameter Area Under the Curve (AUC). Let µT and µc denote the average AUC for Test and 222 13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Reference drugs, respectively. Let δ = µt − µc be the difference. To establish average bioequivalence, the calculated 90% confidence interval of δ should fall within a pre-specified bioequivalence limit. The bioequivalence limits are set at -3 and 3. Accordingly we plan to design the study to test: H0 : µt − µc ≤ −3 or µt − µc ≥ 3 against H1 : −3 < µt − µc < 3 From this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 . Further, we assume that the mean squared error (MSE) from ANOVA would be 2.5. We wish to design a study that would have 90% power at δ1 = 23.19 − 21.62 = 1.57 under H1 . Start East afresh. Click Continuous: Two Samples on the Design tab and then click Crossover Design: Difference of Means. This will launch a new window. The upper pane displays several fields with default values. Select Equivalence for Design Type, and Individual Means for Input Method. Enter 0.05 for Type I Error. Specify the Mean Control (µc ) as 21.62 and Mean Treatment (µt ) as 23.19. Select Sqrt(MSE) from the drop-down list and specify as 2.5. Also specify the Lower Equiv. Limit (δL ) and Upper Equiv. Limit (δU ) as -3 and 3, respectively. The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (54 subjects) is highlighted in 13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design 223 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click Des 1 in the Library to see the details of the designs. Close the 224 13.3 Difference of Means in Crossover Designs – 13.3.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 output window before continuing. 13.3.2 Simulation Select Des 1 in the Library, and click in the toolbar. Alternatively, right-click Des 1 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 21.62; Mean Treatment = 23.19; Sqrt(MSE) = 2.5. Leave the other default values and click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click . Now double-click on Sim 1 in 13.3 Difference of Means in Crossover Designs – 13.3.2 Simulation 225 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample the Library. The simulation output details will be displayed. Notice that the number of rejections was close to 90% of the 10000 simulated trials. The exact result of the simulations may differ slightly, depending on the seed. The simulation we have just done was under H1 . We wish to simulate from a point that belongs to H0 . Right-click Sim 1 in Library and select Edit Simulation. Go to the Response Generation Info tab in the upper pane and specify: Mean control = 21.62; Mean Treatment = 24.62; Sqrt. MSE = 2.5. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. Select Sim 2 in the Output Preview and click 226 . Now double-click on Sim 2 in the Library. The simulation output 13.3 Difference of Means in Crossover Designs – 13.3.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details will be displayed. Notice that the upper efficacy stopping boundary was crossed very close to 5% of the 10000 simulated trials. The exact result of the simulations may differ slightly, depending on the seed. 13.4 Ratio of Means in Crossover Designs Often in crossover designs, an equivalence hypothesis is tested in terms of ratio of means. These types of trials are very popular in establishing bioavailability or bioequivalence between two formulations in terms of pharmacokinetic parameters (FDA guideline on BA/BE studies for orally administered drug products, 2003). In particular, the FDA considers two products bioequivalent if the 90% confidence interval of the ratio of two means lie within (0.8, 1.25). In this chapter, we show how East supports the design and simulation of such experiments with endpoint as ratio of means. In a 2 × 2 crossover design each subject is randomized to one of two sequence groups. We have already discussed 2 × 2 crossover design in section 13.3. However, unlike section 13.3, we are interested in the ratio of means. Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively. In an equivalence trial with endpoint as ratio of means, the goal is to establish ρL < ρ < ρU , where ρL and ρU are specified values used to define equivalence. In practice, ρL and ρU are often chosen such that ρL = 1/ρU The null hypothesis H0 : ρ ≤ ρL or ρ ≥ ρU is tested against the two-sided alternative hypothesis H1 : ρL < ρ < ρU at level α, using two one-sided tests. Schuirmann (1987) proposed working this problem on the natural logarithm scale. Thus, we are interested in the parameter δ = ln(ρ) = ln(µt ) − ln(µc ) and the null hypothesis H0 : δ ≤ δL or 13.4 Ratio of Means in Crossover Designs 227 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample δ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < δ < δU at level α, using two one-sided t-tests. Here δL = ln(ρL ) and δU = ln(ρU ). Since we have translated the ratio hypothesis into a difference hypothesis, we can perform the test for difference as discussed in section 13.1. Note that we need the standard deviation for log transformed data. However, if we are provided with information on CV instead, the standard deviation of log transformed data can be q obtained using the relation sd = 13.4.1 ln (1 + CV2 ). Trial design Standard 2 × 2 crossover designs are often recommended in regulatory guidelines to establish bioequivalence of a generic drug with off patent brand-name drug. Consider a 2 × 2 bioequivalence trial between a Test drug (T) and a Reference Drug (C) where the equivalence need to be established in terms of pharmacokinetic parameter Area Under the Curve (AUC). Let µT and µc denote the average AUC for Test and Reference drugs, respectively. Let ρ = µt /µc be the ratio of averages. To establish average bioequivalence, the calculated 90% confidence interval of ρ should fall within a pre-specified bioequivalence limit. The bioequivalence limits are set at 0.8 and 1.25. Accordingly we plan to design the study to test: H0 : µt /µc ≤ 0.8 or µt /µc ≥ 1.25 against H1 : 0.8 < µt /µc < 1.25 From this study, we consider µc = 21.62 ng.h/mL and µt = 23.19 ng.h/mL under H1 . Further, we assume that the coefficient of variation (CV), or intrasubject variability, is 17%. For a lognormal population, the mean squared error (MSE) from ANOVA of log-transformed data, and CV, are related by: M SE = log(1 + CV 2 ). Thus in this case MSE is 0.0285 and its square-root is 0.169. We wish to design a study that would have 90% power at ρ1 = 23.19/21.62 = 1.073 under H1 . Start East afresh. Click Continuous: Two Samples on the Design tab and then click Crossover Design: Ratio of Means. This will launch a new window. The upper pane displays several fields with default values. Select Equivalence for Design Type, and Individual Means for Input Method. Enter 0.05 for Type I Error. Then specify the Mean Control (µc ) as 21.62 and Mean Treatment (µt ) as 23.19. Specify 0.169 for Sqrt. of MSE Log. Also 228 13.4 Ratio of Means in Crossover Designs – 13.4.1 Trial design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 specify the Lower Equiv. Limit (ρL ) and Upper Equiv. Limit (ρU ) as 0.8 and 1.25, respectively. The upper pane should appear as below: Click Compute. The output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (23 subjects) is highlighted in yellow. This design has default name Des 1. Select this design and click in the Output Preview toolbar. Some of the design details will be displayed in the upper pane, labeled as Output Summary. 13.4 Ratio of Means in Crossover Designs – 13.4.1 Trial design 229 <<< Contents 13 * Index >>> Normal Equivalence Two-Sample In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Double-click Des 1 in the Library to see the details of the designs. 13.4.2 Simulation in the toolbar. Alternatively, right-click Select Des 1 in the Library, and click Des 1 and select Simulate. A new Simulation window will appear. Click on the Response Generation Info tab, and specify: Mean control = 21.62; Mean Treatment = 23.19; Sqrt. of MSE Log = 0.169. Click Simulate. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Select Sim 1 in the Output Preview and click 230 . Now double-click on Sim 1 in 13.4 Ratio of Means in Crossover Designs – 13.4.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Library. The simulation output details will be displayed. Notice that the number of rejections was close to 90% of the 10,000 simulated trials. The exact result of the simulations may differ slightly, depending on the seed. 13.4 Ratio of Means in Crossover Designs 231 <<< Contents * Index >>> 14 Normal: Many Means In this section, we will illustrate various tests available for comparing more than two continuous means in East. 14.1 One Way ANOVA 14.1.1 One Way Contrast In a one-way ANOVA test, we wish to test the equality of means across R independent groups. The two sample difference of means test for independent data is a one-way ANOVA test for 2 groups. The null hypothesis H0 : µ1 = µ2 = . . . = µR is tested against the alternative hypothesis H1 : for at least one pair (i, j), µi 6= µj , where i, j = 1, 2, . . . R. Suppose n patients have been allocated randomly to R treatments. We assume that the data of the R treatment groups comes from R normally distributed populations with the same variance σ 2 , and with population means µ1 , µ2 , . . . , µR . To design a one-way ANOVA study in East, first click Continuous: Many Samples on the Design tab, and then click Factorial Design: One Way ANOVA. In the upper pane of this window is the input dialog box. Consider a clinical trial with four groups. Enter 4 in Number of Groups(R). The trial is comparing three different doses of a drug against placebo in patients with Alzheimer’s disease. The primary objective of the study is to evaluate the efficacy of these three doses, where efficacy is assessed by difference from placebo in cognitive performance measured on a 13-item cognitive subscale. On the basis of pilot data, the expected mean responses are 0, 1.5, 2.5, and 2, for Groups 1 to 4, respectively. The common standard deviation within each group is σ = 3.5. We wish to compute the required sample size to achieve 90% power with a type-1 error of 0.05. Enter these values into the dialog box as shown below. 232 14.1 One Way ANOVA <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then, click Compute. The design is shown as a row in the Output Preview, located in the lower pane of the window. The computed sample size (203) is highlighted in yellow. Select this row, then click in the Output Preview toolbar to save this design to Workbook1 in the Library. With Des1 selected in the Library, click 14.1 One Way ANOVA to 233 <<< Contents 14 * Index >>> Normal: Many Means display the following output. The output indicates that 51 patients per group is necessary to achieve the desired power. Close this output window before continuing. 14.1.1 One Way Contrast A contrast of the population means is a linear combination of the µi ’s. Let ci denote the coefficient for population mean µi in the linear contrast. For a P single contrast test of many means in a one-way ANOVA, the null hypothesis is H : ciP µi = 0 versus a 0 P two-sided alternative H : c µ = 6 0, or a one-sided alternative H : ci µi < 0 or 1 i i 1 P H1 : ci µi > 0. . In the input dialog box, click the With Des1 selected in the Library, click checkbox titled Use Contrast, and select a two-sided test. Ensure that the means for each group are the same as those from Des1 (0, 1.5, 2.5, and 2). In addition, we wish the test the following contrast: −3, 1, 1, 1, which compares the placebo group with the average of the three treatment groups. Finally, we may enter unequal allocation ratios such as: 1, 2, 2, 2, which implies that twice as many patients will be assigned to each 234 14.1 One Way ANOVA – 14.1.1 One Way Contrast <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 treatment group as in the placebo group. Click Compute. The following row will be added to the Output Preview. Given the above contrast and allocation ratios, this study would require a total of 265 patients to achieve 90% power. 14.2 One Way Repeated Measures (Const. Correlation) ANOVA As with the one-way ANOVA discussed in subsection 14.1, the repeated measures ANOVA also tests for equality of means. However, in a repeated measures setting, all patients are measured under all levels of the treatment. As the sample is exposed to each condition in turn, the measurement of the dependent variable is repeated. Thus, there is some correlation between observations from the same patient, which needs to be accounted for. The constant correlation assumption means we assume that the correlation between observations from the same patient is constant for all patients. The correlation parameter (ρ) is an additional parameter that needs to be specified in the one way repeated measures study design. Start East afresh. To design a repeated measure ANOVA study, click Continuous: Many Samples, and click Factorial Design: One Way Repeated Measures (Constant Correlation) ANOVA. A specific type of repeated measures design is a longitudinal study in which patients are followed over a series of time points. As an illustration, we will consider a 14.2 One Way Repeated Measures ANOVA 235 <<< Contents 14 * Index >>> Normal: Many Means hypothetical study that investigated the effect of a dietary intervention on weight loss. The endpoint is decrease in weight (in kilograms) from baseline, measured at four time points: baseline, 4 weeks, 8 weeks, and 12 weeks. For Number of Levels, enter 4. We wish to compute the required sample size to achieve 90% power with a type-1 error of 0.05. The means at each of the four levels are: 0, 1.5, 2.5, 2 for Levels 1, 2, 3, and 4, respectively. Finally, enter σ = 5 and ρ = 0.2, and click Compute. The design is shown as a row in the Output Preview, located in the lower pane of the window. The computed sample size (330) is highlighted in yellow. Select this row, then click in the Output Preview toolbar to save this design to Workbook1 in the Library. With Des1 selected in the Library, click 236 14.2 One Way Repeated Measures ANOVA to <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 display the following output. The output indicates that 83 patients per group is necessary to achieve the desired power. 14.3 Two Way ANOVA In a two-way ANOVA, there are two factors to consider, say A and B. We can design a study to test equality of means across factor A, factor B, or the interaction between of A and B. In addition to the common standard deviation σ, you also need to specify the cell means. For example, consider a study to determine the combined effects of sodium restriction and alcohol restriction on lowering of systolic blood pressure in hypertensive men (Parker et al., 1999). Let Factor A be sodium restriction and Factor B be alcohol restriction. There are two levels of each factor (restricted vs usual sodium intake, and restricted vs usual alcohol intake), producing four groups. Each patient is randomly assigned to one of these four groups. Start East afresh. Click Continuous: Many Samples, and click Factorial Design: Two-Way ANOVA. 14.3 Two Way ANOVA 237 <<< Contents 14 * Index >>> Normal: Many Means Enter a type-1 error of 0.05. Then enter the following values in the input dialog box as shown below: Number of Factor A Levels as 2, Number of Factor B Levels as 2, Common Std. Dev. as 2, A1/B1 as 0.5, A1/B2 as 4.7, A2/B1 as 0.4, and A2/B2 as 6.9. We will first select Power for A, then click Compute. Leaving the same input values, click Compute after selecting Power for B in the input window. Similarly, click Compute after selecting Power for AB. The Output Preview should now have three rows, as shown below. In order to achieve at least 90% power to detect a different across means in factor A, factor B, as well as the interaction, a sample size of 156 patients is necessary (i.e., Des1). Select Des1 in the Output Preview, then click in the toolbar to save to Workbook1 in the Library. With Des1 selected in the Library, click 238 14.3 Two Way ANOVA to <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 display the following output. The output indicates that 39 patients per group is necessary to achieve 90% power to test the main effect of A. 14.3 Two Way ANOVA 239 <<< Contents * Index >>> 15 Multiple Comparison Procedures for Continuous Data It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Formal statistical hypothesis tests can be performed to see if there is strong evidence to support clinical claims. Type I error is inflated when one considers the inferences together as a family. Failure to compensate for multiplicities can have adverse consequences. For example, a drug could be approved when actually it is not better than placebo. Multiple comparison (MC) procedures provides a guard against inflation of type I error due to multiple testing. Probability of making at least one type I error is known as family wise error rate (FWER). East supports several parametric and p-value based MC procedures. In this chapter we explain how to design a study using a chosen MC procedure that strongly maintains FWER. In East, one can calculate the power from the simulated data under different MC procedures. With the information on power, one can choose the right MC procedure that provides maximum power yet strongly maintains the FWER. MC procedures included in East strongly control FWER. Strong control of FWER refers to preserving the probability of incorrectly claiming at least one null hypothesis. To contrast strong control with weak control of FWER, the latter controls the FWER under the assumption that all hypotheses are true. East supports following MC procedures based on continuous endpoint. Category Parameteric P-value Based 240 Procedure Dunnett’s Single Step Dunnett’s Step Down Dunnett’s Step Up Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback Reference Dunnett CW (1955) Dunnett CW and Tamhane AC (1991) Dunnett CW and Tamhane AC (1992) Bonferroni CE (1935, 1936) Sidak Z (1967) Benjamini Y and Hochberg Y ( 1997) Holm S (1979) Hochberg Y (1988) Hommel G (1988) Westfall PH, Krishen A (2001) Wiens B, Dimitrienko A (2005) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 15.1 Parametric Procedures 15.1.1 Dunnett’s single step 15.1.2 Dunnett’s stepdown and step-up procedures Assume that there are k arms including the placebo arm. Let ni be the number of Pk−1 subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total sample size and the arm 0 refers to placebo. Let Yij be the response from subject j in treatment arm i and yij be the observed value of Yij (i = 0, 2, · · · , k − 1, j = 1, 2, · · · , ni ). Suppose that Yij = µi + eij (15.1) where eij ∼ N (0, σ 2 ). We are interested in the following hypotheses: For the right tailed test: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0 For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0 For the global null hypothesis at least one of the Hi is rejected in favor of Ki after controlling for FWER. Here Hi and Ki refer to null and alternative hypotheses, respectively, for comparison of i-th arm with the placebo arm. East supports three parametric MC procedures - single step Dunnett test (Dunnett, 1955), step-down Dunnett test and step-up Dunnett test. These procedures make two parametric assumptions - normality and homoscedasticity. Let ȳi be the sample mean for treatment arm i and s2 be the pooled sample variance for all arms. The test statistic for comparing treatment effect of arm i with placebo can be defined as ȳi − ȳ0 Ti = q s n1i + n10 (15.2) Let ti be the observed value of Ti and these observed values for K − 1 treatment arms can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . Detailed formula to obtain critical boundaries for single step Dunnett and step-down Dunnett tests are discussed in Appendix H. In single step Dunnett test, the critical boundary remains same for all the k − 1 individual tests. Let cα be the critical boundary that maintains FWER of α and p̃i be the adjusted p− value associated with comparison of i-th arm and placebo arm. Then for a right tailed test, Hi is rejected if ti > cα and for a left tailed test Hi is rejected if ti < cα . Unlike in single step Dunnett test, the critical boundary does not remain same for all the k − 1 individual tests in step-down Dunnett test. Let ci be the critical boundary and p̃i be the adjusted p-value associated with comparison of i-th arm and placebo arm. For a right tailed test H(i) is rejected if t(i) > ci and H(1) , · · · , H(c−i) have been 15.1 Parametric Procedures 241 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data already rejected. For a left tailed test H(i) is rejected if t(i) < ck−i and H(i−1) , · · · , H(k−1) have been already rejected. Unlike step-down test, step-up Dunnett procedure starts with the least significant test statistic i.e., t(k−1) . Let ci be the critical boundary and p̃i be the adjusted p-value associated with comparison of i-th arm and placebo arm. The i-th test statistic in order i.e., t(i) will be tested if and only if none of H(i+1) , · · · , H(k−1) are rejected. If H(i) is rejected then stop and reject all of H(i) , · · · , H(1) . For a right tailed test, H(i) is rejected if t(i) > c(i) and for a left tailed test H(i) is rejected if t(i) < c(i) . For both single step Dunnett and step-down Dunnett tests, the global null hypothesis is rejected in favor of at least one right tailed alternative if H(1) is rejected and in favor of at least one left tailed alternative if H(k−1) is rejected . Single step Dunnett test and step-down Dunnett test can be seen as the parametric version of Bonferroni procedure and Holm procedure, respectively. Parametric tests are uniformly more powerful than the corresponding p-value based tests when the parametric assumption holds or at least approximately holds, especially when there are a large number of hypotheses. Parametric procedures may not control FWER if the standard deviations are different. 15.1.1 Dunnett’s single step Dunnett’s Single Step procedure is described below with an example. Example: Alzheimer’s Disease Clinical Trial In this section, we will use an example to illustrate how to design a study using the MCP module in East. This is a randomized, double-blind, placebo-controlled, parallel study to assess three different doses (0.3 mg, 1 mg and 2 mg) of a drug against placebo in patients with mild to moderate probable Alzheimer’s disease. The primary objective of this study is to evaluate the safety and efficacy of the three doses. The drugs are administered daily for 24 weeks to subjects with Alzheimer’s disease who are either receiving concomitant treatment or not receiving any co-medication. The efficacy is assessed by cognitive performance based on the Alzheimer’s disease assessment scale-13-item cognitive sub-scale. From previous studies, it is estimated that the common standard deviation of the efficacy measure is 5. It is expected that the dose-response relationship follows straight line within the dose range we are interested. We would like to calculate the power for a total sample size of 200. This will be a balanced study with a one-sided 0.025 significance level to detect at least one dose 242 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 with significant difference from placebo. We will show how to simulate the power of such a study using the multiple comparison procedures listed above. Designing the study First, click (Continuous: Many Samples) on the Design tab and then click Multi-Arm Design: Pairwise Comparisons to Control - Difference of Means. This will launch a new window. There is a box at the top with the label Number of Arms. For our example, we have 3 treatment groups plus a placebo. So enter 4 for Number of Arms. Under the Design Parameters tab, there are several fields which we will fill in. First, there is a box with the label Side. Here you need to specify whether you want a one-sided or two-sided test. Currently, only one-sided tests are available. Under it you will see the box with label Sample Size (n). For now skip this box and move to the next dropdown box with the label Rejection Region. If left tail is selected, the critical value for the test is located in the left tail of the distribution of the test statistic. Likewise, if right tail is selected the critical value for the test is located in the right tail of the distribution of the test statistic. For our example, we will select Right Tail. Under that, there is a box with the label Type - 1 Error (α). This is where you need to specify the FWER. For our example, enter 0.025. Now go to the box with the label Total Sample Size. Here we input the total number of subjects, including those in the placebo arm. For this example, enter 200. To the right, there will be a heading with the title Multiple Comparison Procedures. In the parametric grouping, check the box next to Dunnett’s single step, as this is the multiple comparison procedure we are illustrating in this subsection. After entering these parameters your screen should now look like this: Now click on Response Generation Info tab. You will see a table titled Table of Proportions. In this table we can specify the labels for treatment arms. Also you have to specify the dose level if you want to generate means through dose-response curve. Since we are comparing placebo and 3 dose groups, enter Placebo, Dose1, Dose2 and Dose3 in the 4 cells in first column labeled as Arm. 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step 243 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data The table contains the default mean and standard deviation for each arm which we will change later. There are two check boxes in this tab above the table. The first is labeled Generate Means through DR Curve. There are two ways to specify the mean response for each arm: 1) generate means for each arm through a dose-response curve or 2) Specify the mean directly in the Table of Proportions. To specify the mean directly just enter the mean value for each arm in the table in Mean column. However, in this example, we will generate means through dose response curve. In order to do this, check Generate Means through DR Curve box. Once you check this box you will notice two things. First, an additional column with label Dose will appear in the table. Here you need to enter the dose levels for each arm. For this example, enter 0, 0.3, 1 and 2 for Placebo, Dose1, Dose2 and Dose3 arms, respectively. Secondly, you will notice an additional section will appear to the right which provides the option to generate the mean response from four families of parametric curves which are Four Parameter Logistic, Emax, Linear and Quadratic. The technical details about each curve can be found in the Appendix H. Here you need to choose the appropriate parametric curve from the drop-down list under Dose Response Curve and then you have to specify the parameters associated with these curves. For the Alzheimer’s disease example, suppose the dose response follows a linear curve with intercept 0 and slope 1.5. To do this, we would need to select ”Linear” from the dropdown list. To right of this dropdown box, specify the parameter values of the selected curve family by inputting 0 for Intercept(E0) and 1.5 for Slope(δ). After specifying this, the mean values in the table will be changed accordingly. Here we are generating the means using the following linear dose-response curve: E(Y |Dose) = E0 + δ × Dose (15.3) For placebo, the mean can be obtained by specifying Dose as 0 in the above equation. This gives the mean for placebo arm as 0. For arm Dose1, mean would be 0 + 1.5 × 0.3 or 0.45. Similarly the means for the arm Dose2 and Dose3 will be obtained as 1.5 and 3. You can verify that the values in Mean column is changed to 0, 0.45, 1.5 and 3 for the four arms, respectively. 244 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now click Plot DR Curve to see the plot of means against the dose levels. You will see the linear dose response curve that intersects the Y-axis at 0. Now close this window. The dose response curve generates means, but still we have to specify the standard deviation. Standard deviation for each arm could be either equal or different. To specify the common standard deviation, check the box with label Common Standard Deviation and specify the common standard deviation in the field next to it. When standard deviations for different arms are not all equal, the standard deviations need to be directly specified in the table in column labeled with Std. Dev.. In this example, we are considering a common standard deviation of 5. So check the box for Common Standard Deviation and specify 5 in the field next to it. Now the column Std.Dev. will be updated with 5 for all the four arms. As we have finished specifying all the fields in the Response Generation Info tab, this should appear as below. 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step 245 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data Click on the Include Options button located in the right-upper corner in the Simulation window and check Randomized Info. This will add an additional tab Randomization Info. Now click on the Randomization Info tab. Second column of the Table of Allocation table displays the allocation ratio of each treatment arm to that of control arm. The cell for control arm is always one and is not editable. Only those cells for treatment arms other than control need to be filled in. The default value for each treatment arm is one which represents a balanced design. For the Alzheimer’s disease example, we consider a balanced design and leave the default values for the allocation ratios unchanged. Your screen should now look like this: The last tab is Simulation Control Info. Specify 10000 as Number of Simulations and 1000 as Refresh Frequency in this tab. The box labeled Random Number Generator is where you can set the seed for the random number generator. You can either use the clock as the seed or choose a fixed seed (in order to replicate past simulations). The default is the clock and we will use that. The box on the right hand side is labeled Output Options. This is where you can choose to save summary statistics for each simulation run and/or to save subject level data for a specific number of simulation runs. To save the output for each simulation, check the box with label Save summary statistics for every simulation run. Now click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 1. Note that a simulation node Sim 1 is created in the library. Also note that another node is appended to the simulation node with label SummaryStat which contains detailed simulation summary statistics for each simulation run. Select Sim 1 in the Output Preview and click 246 icon to save the simulation in the library. Now double-click on 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sim 1 in the Library. The simulation output details will be displayed in the right pane. The first section in the output is the Hypothesis section. In our situation, we are testing 3 hypotheses. We are comparing the mean score on the Alzheimer’s disease assessment scale (13-item cognitive sub-scale) for each dose with that of placebo. That is, we are testing the 3 hypotheses: H1 :µ1 = µ0 vs K1 :µ1 > µ0 H2 :µ2 = µ0 vs K2 :µ2 > µ0 H3 :µ3 = µ0 vs K3 :µ3 > µ0 Here, µP , µ1 , µ2 and µ3 represent the population mean score on the Alzheimer’s disease assessment scale for the placebo, 0.3 mg, 1 mg and 2 mg dose groups, respectively. Also, Hi and Ki are the null and alternative hypotheses, respectively, for the i-th test. The Input Parameters section provides the design parameters that we specified earlier. The next section Overall Power gives us estimated power based on the 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step 247 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data simulation. The second line gives us the global power, which is about 75%. Global power indicates the power to reject global null H0 : µ1 = µ2 = µ3 = µ0 . Thus, the global power indicates that 75% of times the global null will be rejected. In other words, at least one of the H1 , H2 and H3 is rejected in about 75% of the occasion. Global power is useful to show the existence of dose-response relationship and dose-response may be claimed if any of the doses in the study is significantly different from placebo. The next line displays the conjunctive power. Conjunctive power indicates the proportion of cases in the simulation where all the Hi ’s, which are truly false, were rejected. In this example, all the Hi ’s are false. Therefore, for this example, conjunctive power is the proportion of cases where all of the H1 , H2 and H3 were rejected. For this simulation conjunctive power is only about 2.0% which means that only in 2.0% of time, all of the H1 , H2 and H3 were rejected. Disjunctive power indicates the proportion of rejecting at least one of those Hi ’s where Hi is truly false. The main distinction between global and distinctive power is that the former finds any rejection whereas the latter look for rejection only among those Hi ’s which are false. Since here all of the H1 , H2 and H3 are false, therefore, global and disjunctive power ought to be the same. The next section gives us the marginal power for each hypothesis. Marginal power finds the proportion of times when a particular hypothesis is rejected after applying multiplicity adjustment. Based on simulation results, H1 is rejected about 3% of times, H2 is rejected about 20% of times and H3 is rejected a little more than 70% of times. Recall that we have asked East to save the simulation results for each simulation run—. Open this file by clicking on SummaryStat in the library and you will see that it contains 10,000 rows - each rows represents results for a single simulation. Find the 3 columns with labels Rej Flag 1, Rej Flag 2 and Rej Flag 3, respectively. These columns represents the rejection status for H1 , H2 and H3 , respectively. A value of 1 is indicator of rejection on that particular simulation, otherwise the null is not rejected. Now the proportion of 1’s in Rej Flag 1 indicates the marginal power to reject H1 . Similarly we can find out the marginal power for H2 and H3 from Rej Flag 2 and Rej Flag 3, respectively. To obtain the global and disjunctive power, count the total number of cases where at least one of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. Similarly, to obtain the conjunctive power count the total number of cases where all of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. 248 15.1 Parametric Procedures – 15.1.1 Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Next we will consider an example to show how global and disjunctive power are different from each other. Select Sim 1 in Library and click . Now go to the Response Generation Info tab and uncheck the Generate Means Through DR Curve box. The table will now have only three columns. Specify Dose1, Dose2 and Dose3 in the 4 cells in first column labeled as Arm and enter 0, 0, 1 and 1.2 in the 4 cells in second column labeled as Mean. Here we are generating response for placebo from distribution N (0, 52 ), for Dose1 from distribution N (0, 52 ), for Dose2 from distribution N (1, 52 ) and for Dose3 from distribution N (1.2, 52 ). Now click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. For Sim 2, the global power and disjunctive power are 17.9% and 17.6%, respectively. To understand why, we need to open the saved simulation data for Sim 2. The total number of cases where at least one of H1 , H2 and H3 is rejected is 1790 and dividing this by total number of simulation 10,000 gives the global power of 17.9%. Again, the total number of cases where at least one of H2 and H3 are rejected is 1760 and dividing this by total number of simulation 10,000 gives the disjunctive power of 17.6%. The exact result of the simulations may differ slightly, depending on the seed. 15.1 Parametric Procedures – 15.1.2 Dunnett’s step-down and step-up procedures 249 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data 15.1.2 Dunnett’s step-down and step-up procedures Dunnett’s Step-Down procedure is described below using the same Alzheimer’s Disease example from the previous section 15.1.1 on Dunnett’s Single Step. Since the other design specification remains same except that we are using Dunnett’s step-down in place of single step Dunnett’s test, we can design simulation in this section with only little effort. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Dunnett’s step-down and Dunnett’s step-up box. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview labeled as Sim 3 and Sim 4. Dunnett step-down procedure and step-down have global and disjunctive power of close to 75% and conjunctive power of close to 4%. To see the marginal power for icon. Now, each test, select Sim 3 and Sim 4 in the Output Preview and click double-click on Sim 3 in the Library. The simulation output for Dunnett step-down 250 15.1 Parametric Procedures – 15.1.2 Dunnett’s step-down and step-up procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 procedure details will be displayed in the right pane. The marginal power for comparison of Dose1, Dose2 and Dose3 using Dunnett step-down procedure are close to 5%, 23% and 74%, respectively. Similarly one can find the marginal power for individual tests in Dunnett step-up procedure. 15.2 p-value based Procedures 15.2.1 Single step MC procedures 15.2.2 Data-driven stepdown MC procedure 15.2.3 Data-driven step-up MC procedures 15.2.4 Fixed-sequence stepwise MC procedures p-value based procedures strongly control the FWER regardless of the joint distribution of the raw p-values as long as the individual raw p-values are legitimate p-values. Assume that there are k arms including the placebo arm. Let ni be the Pk−1 number of subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total sample size and the arm 0 refers to placebo. Let Yij be the response from subject j in treatment arm i and yij be the observed value of Yij (i = 0, 2, · · · , k − 1, j = 1, 2, · · · , ni ). Suppose that Yij = µi + eij (15.4) where eij ∼ N (0, σi2 ). We are interested in the following hypotheses: For the right tailed test: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0 For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0 15.2 p-value based Procedures 251 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data For the global null hypothesis at least one of the Hi is rejected in favor of Ki after controlling for FWER. Here Hi and Ki refer to null and alternative hypotheses, respectively, for comparison of i-th arm with the placebo arm. Let ȳi be the sample mean for treatment arm i, s2i be the sample variance from i-th arm and s2 be the pooled sample variance for all arms. For the unequal variance case, the test statistic for comparing treatment effect of arm i with placebo can be defined as Ti = q ȳi − ȳ0 1 2 ni si + (15.5) 1 2 n0 s0 For the equal variance case, one need to replace s2i and s20 by the pooled sample variance s2 . For both the case, Ti is distributed as Student’s t distribution. However, the degrees of freedom varies for equal variance and unequal variance case. For equal variance case the degrees of freedom would be N − k. For the unequal variance case, the degrees of freedom is subject to Satterthwaite correction. Let ti be the observed value of Ti and these observed values for K − 1 treatment arms can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . For the right tailed test the marginal p-value for comparing the i-th arm with placebo is calculated as pi = P (T > ti ) and for left tailed test pi = P (T < ti ), where T is distributed as Student’s t distribution. Let p(1) ≤ p(2) ≤ · · · ≤ p(k−1) be the ordered p-values. 15.2.1 Single step MC procedures East supports three p-value based single step MC procedures - Bonferroni procedure, Sidak procedure and weighted Bonferroni procedure. For the Bonferroni procedure, α and the adjusted p-value is given as min(1, (k − 1)pi ). For Hi is rejected if pi < k−1 1 the Sidak procedure, Hi is rejected if pi < 1 − (1 − α) k−1 and the adjusted p-value is given as 1 − (1 − pi )k−1 . For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the Pk−1 1 , proportion of α allocated to the Hi such that i=1 wi = 1. Note that, if wi = k−1 then the Bonferroni procedure is reduced to the regular Bonferroni procedure. Bonferroni and Sidak procedures Bonferroni and Sidak procedures are described below using the same Alzheimer’s Disease example from the section 15.1.1 on Dunnett’s Single Step. Since the other design specification remains same except that we are using Bonferroni and Sidak in place of single step Dunnett’s test, we can design simulation in this 252 15.2 p-value based Procedures – 15.2.1 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 section with only little effort.Select Sim 1 in Library and click . Now go to the Design Parameters tab. In the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Bonferroni and Sidak boxes. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview. Bonferroni and Sidak procedures have disjunctive and global powers of close to 73% and conjunctive power of about 1.8%. Now select Sim 5 and Sim 6 in the Output Preview using the Ctrl key and click icon. This will save Sim 5 and Sim 6 in the Wbk1 in Library. Weighted Bonferroni procedure As before we will use the same Alzheimer’s Disease example to illustrate weighted Bonferroni procedure. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Weighted Bonferroni box. Next click on Response Generation Info tab and look at the Table of Proportions. You will see an additional column with label Proportion of Alpha is added. Here you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default, East distributes the total alpha equally among all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as well. For this example, keep the equal 15.2 p-value based Procedures – 15.2.1 Single step MC procedures 253 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data proportion of alpha for each test. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 7. The weighted Bonferroni MC procedure has global and disjunctive power of 73.7% and conjunctive power of 1.6%. Note that, the powers in the weighted Bonferroni procedure is quite close to the Bonferroni procedure. This is because the weighted Bonferroni procedure with equal proportion is equivalent to the simple Bonferroni procedure. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim 7 in the Output Preview and click This will save Sim 7 in Wbk1 in Library. 15.2.2 icon. Data-driven step-down MC procedure In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures decision of one hypothesis test can influence the decisions on the other tests of hypotheses. There are two types of stepwise procedures. One type of procedures proceed in data-driven order. The other type proceeds in a fixed order set a priori. Stepwise tests in a data-driven order can proceed in step-down or step-up manner. East supports Holm step-down MC procedure which start with the most significant comparison and continue as long as tests are significant until the test for certain hypothesis fails. The testing procedure stops at the first time a non-significant comparison occurs and all remaining hypotheses will be retained. In i-th step, H(k−i) is rejected if p(k−i) ≤ αi and go to the next step. Holm’s step-down As before we will use the same Alzheimer’s Disease example to illustrate Holm’s . Now go to the Design step-down procedure. Select Sim 1 in Library and click Parameters tab. In the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Holm’s Step-down box. Now click Simulate to obtain power. Once the simulation run has completed, East will 254 15.2 p-value based Procedures – 15.2.2 Data-driven step-down MC procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 add an additional row to the Output Preview labeled as Sim 8. Holm’s step-down procedure has global and disjunctive power of 74% and conjunctive power of 4.5%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim 8 in the Output Preview and click Sim 8 in Wbk1 in Library. 15.2.3 icon. This will save Data-driven step-up MC procedures Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures 15.2 p-value based Procedures – 15.2.3 Data-driven step-up MC procedures 255 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1 α for j = 1, · · · , i. Fixed i sequence test and fallback test are the types of tests which proceed in a prespecified order. Hochberg’s and Hommel’s step-up procedures Hochberg’s and Hommel’s step-up procedures are described below using the same Alzheimer’s Disease example from the section 15.1.1 on Dunnett’s Single Step. Since the other design specification remains same except that we are using Hocheberg and Hommel step-up procedures in place of single step Dunnett’s test we can design simulation in this section with only little effort. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Hochberg’s step-up and Hommel’s step-up boxes. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview labeled as Sim 9 and Sim 10. Hocheberg and Hommel procedures have disjunctive and global powers of close to 74 256 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 15.2.4 Fixed-sequence stepwise MC procedures In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both of these procedures. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is pre-specified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence test is that once a hypothesis is not rejected no further testing is permitted. This will lead to lower power to reject hypotheses tested later in the sequence. Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Fixed sequence testing procedure As before we will use the same Alzheimer’s Disease example to illustrate fixed sequence testing procedure. Select Sim 1 in Library and click . Now go to the Design Parameters tab. There in the Multiple Comparison Procedures box, 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures 257 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data uncheck the Dunnett’s single step box and check the Fixed Sequence box. Next click on Response Generation Info tab and look at the Table of Proportions. You will see an additional column with label Test Sequence is added. Here you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default which means that H1 will be tested first followed by H2 and finally H3 will be tested. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 11. The fixed sequence procedure with the specified sequence has global and disjunctive power of less than 7% and conjunctive power of 5%. The reason for small global and 258 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 disjunctive power is due to the smallest treatment effect is tested first and the magnitude of treatment effect increases gradually for the remaining tests. For optimal power in fixed sequence procedure, the early tests in the sequence should have larger treatment effects. In our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Therefore, to obtain optimal power, H3 should be tested first followed by H2 and H1 . Select Sim 11 in the Output Previewand click icon. Select Sim 11 in Library, click and go to the Response Generation Info tab. In Test Sequence column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim 12. Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) has global and disjunctive power close to 85% and conjunctive power close to 5%. This example illustrates that fixed sequence procedure is powerful provided the hypotheses are tested in a sequence of descending treatment effects. Fixed sequence procedure controls the FWER because for each hypothesis, testing is conditional upon rejecting all hypotheses earlier in sequence. The exact result of the simulations may differ slightly, depending on the seed. Select Sim 12 in the Output Preview and click 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures 259 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data icon to save it in Library. Fallback procedure Again we will use the same Alzheimer’s Disease example to illustrate the fallback procedure. Select Sim 1 in Library and click . There in the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Fallback box. Next click on Response Generation Info tab and look at the Table of Proportions. You will see two additional columns with label Test Sequence and Proportion of Alpha. In the column Test Sequence, you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default which means that H1 will be tested first followed by H2 and finally H3 will be tested. In the column Proportions of Alpha, you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default East distributes the total alpha equally among the all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as well. For this example, keep the equal proportion of alpha for each test. 260 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim 13. Now we will consider a sequence where H3 will be tested first followed by H2 and H1 because in our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Select Sim 13 in the Output Previewand click icon. Select Sim 12 in Library, click and go to the Response Generation Info tab. In Test Sequence column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will 15.2 p-value based Procedures – 15.2.4 Fixed-sequence stepwise MC procedures 261 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data add an additional rows to the Output Preview labeled as Sim 14. Note that the fallback test is more robust to the misspecification of the test sequence but fixed sequence test is very sensitive to the test sequence. If the test order is misspecified, fixed sequence test has very poor performance. 15.3 Comparison of MC procedures We have obtained the power (based on the simulation) for different MC procedures for the Alzheimer’s Disease example from the section 15.1.1. Now the obvious question is which MC procedure to choose. To compare all the MC procedure, we will perform simulation for all the MC procedures under the following scenario. Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3 (dose=2 mg) with respective groups means as 0, 0.45, 1.5 and 3, respectively. common standard deviation = 5 Type I Error: 0.025 (right-tailed) Number of Simulations:10000 Total Sample Size:200 Allocation ratio: 1 : 1 : 1 : 1 For comparability of simulation results, we have used similar seed for simulation under all MC procedures. Following output displays the powers under different MC 262 15.3 Comparison of MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 procedures. Here we have used equal proportions for weighted Bonferroni and Fallback procedures. For the two fixed sequence testing procedures (fixed sequence and fallback) two sequences have been used - (H1 , H2 , H3 ) and (H3 , H2 , H1 ). As expected, Bonferroni and weighted Bonferroni procedures provides similar powers. It appears that fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) provides the power of close to 85% which is the maximum among all the procedures. However, fixed sequence procedure with the pre-specified sequence (H1 , H2 , H3 ) provides power of less than 7%. Therefore, power in fixed sequence procedure is largely dependent on the specification of sequence of testing and a mis-specification might result in huge drop in power. For this reason, fixed sequence procedure may not be considered as appropriate MC procedure to go with. Dunnett’s single step, step-down and step-up procedures are the next in order after fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ). All the three procedures attain close to 75% of disjunctive power, respectively. However, all these three procedures assume that all the treatment arms have equal variance. Therefore, if homogeneity of variance between the treatment arms is a reasonable assumption, Dunnett’s step-down or single step procedure should be the best option based on these simulation results. However, when the assumption of equal variance is not met, Dunnett’s procedure may not be the appropriate procedure as the type I error might not be strongly controlled. Next in the list are the fallback procedures and both of them provides a little more than 73% power which is very close to the power attained by Dunnett’s procedures. Therefore, unlike fixed sequence procedure, fallback procedure does not depend much on the order of the hypotheses they are tested. Moreover, this does not require the 15.3 Comparison of MC procedures 263 <<< Contents 15 * Index >>> Multiple Comparison Procedures for Continuous Data assumption of equal variance among the treatment arms to be met. For all these reasons, fallback procedure seems to be the most appropriate MC procedure for the design we are interested in. Now, we will perform the comparison but this time with unequal variance between the treatment arms. Precisely, we simulate data under the following scenario to see the type I error rate control of different procedures. Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3 (dose=2 mg) with respective groups means as 0, 0, 0 and 0, respectively. standard deviation for placebo, dose1 and dose2 is 5; standard deviation for dose3 is 10 Type I Error: 0.025 (right-tailed) Number of Simulations:1000000 Total Sample Size:200 Allocation ratio: 1 : 1 : 1 : 1 Following output displays the type I error rate under different MC procedures for the unequal variance case. Note that the Dunnett tests slightly inflate type I error rate but all other procedures control the type I error rate below the nominal level 0.025. 264 15.3 Comparison of MC procedures <<< Contents * Index >>> 16 Multiple Endpoints-Gatekeeping Procedures 16.1 Introduction Clinical trials are often designed to assess benefits of a new treatment compared to a control treatment with respect to multiple clinical endpoints which are divided into hierarchically ordered families. Typically, the primary family of endpoints defines the overall outcome of the trial, provides the basis for regulatory claim and is included in the product label. The secondary families of endpoints play a supportive role and provide additional information for physicians, patients, payers and hence are useful for enhancing product label. Gatekeeping procedures are specifically designed to address this type of multiplicity problems by explicitly taking into account the hierarchical structure of the multiple objectives. The terminology-gatekeeping indicates the hierarchical decision structure where the higher ranked families serve as gatekeepers for the lower ranked family. The lower ranked families won’t be tested if the higher ranked families are not passed. Two types of gatekeeping procedures are described in this chapter. One is serial gatekeeping procedure and the other one is parallel gatekeeping procedure. In the next few sections, specific examples will be provided to illustrate how to design trials with each type of gatekeeping procedures. For more information about applications of gatekeeping procedures in a clinical trial setting and literature review on this topic, please refer to Dmitrienko and Tamhane (2007). 16.2 Simulate Serial Gatekeeping Design Serial gatekeeping procedures were studied by Maurer, Hothorn and Lehmacher (1995), Bauer et al. (1998) and Westfall and Krishen (2001). Serial gatekeepers are encountered in trials where endpoints are usually ordered from most important to least important. Reisberg et al. 2003 reported a study designed to investigate memantine, an N-methyl-D-aspartate (NMDA) antagonist, for the treatment of alzheimer’s disease in which patients with moderate-to-severe Alzheimer’s disease were randomly assigned to receive placebo or 20 mg of memantine daily for 28 weeks. The two primary efficacy variables were: (1) the Clinician’s Interview-Based Impression of Change Plus Caregiver Input (CIBIC-Plus) global score at 28 weeks, (2) the change from base line to week 28 in the Alzheimer’s Disease Cooperative Study Activities of Daily Living Inventory modified for severe dementia (ADCS-ADLsev). The CIBIC-Plus measures overall global change relative to base line and is scored on a seven-point scale ranging from 1 (markedly improved) to 7 (markedly worse). For illustration purpose, we redefine the primary endpoint of clinician’s global assessment score as 7 minus the CIBIC-Plus score so that a larger value indicates improvement (0 markedly worse and 6 markedly improved). The secondary efficacy endpoints included the Severe Impairment Battery and other measures of cognition, function, and behavior. Suppose that the trial is declared successful only if the treatment effect is demonstrated on both endpoints. If the trial is successful, it is of interest to assess the two secondary endpoints: (1) Severe Impairment Battery (SIB), (2) Mini-Mental State Examination 16.2 Simulate Serial Gatekeeping Design 265 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures (MMSE). The SIB was designed to evaluate cognitive performance in advanced Alzheimer’ disease. A 51-item scale, it assesses social interaction, memory, language, visuospatial ability, attention, praxis, and construction. The scores range from 0 (greatest impairment) to 100. The MMSE is a 30-point scale that measures cognitive function. The means of the endpoints for subjects in the control group and experimental group and the common covariance matrix are as follows CIBIC-Plus ADCS-ADLsev SIB MMSE Mean Treatment Mean Control 2.6 -2.5 -6.5 -0.4 2.3 -4.5 -10 -1.2 CIBIC-Plus ADCS-ADLsev SIB MMSE 1.2 3.6 6.8 1.6 3.6 42 38 9.3 6.8 38 145 17 1.6 9.3 17 8 Typically there are no analytical ways to compute the power for gatekeeping procedures. Simulations can be used to assess the operating characteristics of different designs. For example, one could simulate the power for given sample sizes. To start the simulations, click Two Samples in the Design tab and select Multiple Comparisons-Multiple Endpoints to see the following input windows 266 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 On the top of this input window, one needs to specify the total number of endpoints and other input parameters such as Rejection Region, Type I Error, Sample Size. One also needs to select the multiple comparison procedure which will be used to test the last family of endpoints. The type I error specified on this screen is the nominal level of the familywise error rate which is defined as the probability of falsely declaring the efficacy of the new treatment compared to control with respect to any endpoint. For the Alzheimer’s disease example, CIBIC-Plus and ADCS-ADlsev form the primary family, and the other endpoints SIB and MMSE form the secondary family. Suppose that we would like to see the power for a sample size of 250 at a nominal type I error rate 0.025 using Bonferroni test for the secondary family, then the input window looks as follows 16.2 Simulate Serial Gatekeeping Design 267 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures Behind the window for Simulation Parameters, there is another window tab labeled as Response Generation Info. The window for Response Generation Info tab shown below allows one to specify the underlying joint distribution among the multiple endpoints for control arm and for experimental arm. The joint distribution among the endpoints are assumed to be multivariate normal with common covariance matrix. One also needs to specify which family each endpoint belongs to in the column with label Family Rank. One can also customize the label for each endpoint. For the Alzheimer’s disease example, the inputs for this window should be specified as follows One can specify the number of simulations to be performed on the window with the 268 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 label Simulation Control Info. By default, 10000 simulations will be performed. One can also save the summary statistics for each simulated trial or save subject-level data by checking the appropriate box in the output option area. To simulate this design, click the Simulate button at the bottom right of the screen to see the preliminary output displayed in the output preview area as seen in the following screen. All the results displayed in the yellow cells are summary outputs generated from simulations. For example, the actually FWER, number of families, conjunctive power for the primary family, conjunctive power and disjunctive power for the last family. To view the detailed output, first save the simulation into a workbook in the library by clicking on the tool button and you will notice that a simulation node appears in the library as shown in the following screen. Now double click on the simulation node Sim1 to see the detailed output as shown in the following screen. The detailed output summarizes all the main input parameters such as the multiple comparison procedure used for the last family of endpoints, the nominal type I error level, total sample size, mean values for each endpoint in the control arm and that in the experimental arm etc. It also displays the attained overall FWER, conjunctive power, disjunctive power, the FWER and conjunctive power for each gatekeeper family, the FWER and conjunctive power and disjunctive power for the last family. The definitions of different types of power are as follows: Overall Power and FWER: Global: probability of declaring significance on any of the endpoints Conjunctive: probability of declaring significance on all of the endpoints for which the 16.2 Simulate Serial Gatekeeping Design 269 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures treatment arm is truly better than the control arm Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error among all the endpoints Power and FWER for Individual Gatekeeper Family except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family Power and FWER for the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the last family Marginal Power: probability of declaring significance on the particular endpoint For the Alzheimer’s disease example, the conjunctive power, which characterizes the power for the study, is 46.9% for a total sample size of 250. Using Bonferroni test for the last family, the design has 40.5% probability (disjunctive power for the last family) to detect the benefit of memantine with respect to at least one of the two secondary endpoints, SIB and MMSE. It has 25.1% chance (conjunctive power for the last family) 270 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 to declare the benefit of memantine with respect to both of the secondary endpoints. One can find the sample size to achieve a target power by simulating multiple designs in a batch mode. For example, one could simulate a batch of designs for a range of 16.2 Simulate Serial Gatekeeping Design 271 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures sample size changing from 250 to 500 in step of 50 as shown in the following window. Note that a total sample size somewhere between 450 to 500 provides 80% power to detect the mean differences for both primary endpoints CIBIC-Plus and ADCS-ADLsev as seen in the following window. To get a more precise sample size to achieve 80% power, one could simulate a bunch of designs with the sample size ranging from 450 to 500 in step of 10. One will notice that a sample size of 480 provides over 80% power to claim the significant differences with respect to both primary endpoints. 272 16.2 Simulate Serial Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One could compare the multiple designs side by side by clicking on the tool button in the output preview area as follows: There is a special case where all the endpoints belong to one single family. The software handle this special case in a particular manner. Intersection-Union test will be applied to a single family of endpoints and the selected MCP for the last family in the Simulation Parameter tab is not applicable for this special case. For the Alzheimer disease example, if we are only interested in testing the two endpoints (CIBIC-Plus and ADCS-ADLsev) as co-primary endpoints as indicated by the family rank in the window for Response Generation Info, then the Intersection-Union test will be applied to the two endpoints so that each endpoint is tested at nominal level α. The detailed output window is slightly different in case of single family of endpoints as seen in the 16.2 Simulate Serial Gatekeeping Design 273 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures following window. 16.3 Simulate Parallel Gatekeeping Design Parallel gatekeeping procedures are often used in clinical trials with several primary objectives where each individual objective can characterize a successful trial outcome. In other words, the trial can be declared to be successful if at least one primary objective is met. Consider a randomized, double blinded and parallel group designed clinical trial to compare two vaccines against the human papilloma virus. Denote vaccine T the new vaccine and vaccine C the comparator. The primary objective of this study is to demonstrate that vaccine T is superior to vaccine C for the antigen type 16 or 18 which account for 70% of cervical cancer cases globally. If the new vaccine shows superiority over the comparator with respect to either antigen type 16 or 18, it is of interest to test the superiority of vaccine T to vaccince C for the antigen type 31 or 45. The two types of vaccines are compared based on the immunological response, i.e. the number of T-cell in the blood, seven months after the vaccination. Assume that the log transformed data is normally distributed with mean µiT or µiC (i = 1, 2, 3, 4) where the index 1, 2, 3, and 4 represent the four antigen types respectively. The null 274 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 16.1: Mean response and Standard Deviation Endpoints Mean for Vaccine C Mean for Vaccine T Standard Deviation Type 16 Type 18 Type 31 Type 45 4 3.35 2 1.42 4.57 4.22 2.34 2 0.5 0.5 0.6 0.3 hypotheses and alternative hypotheses can be formulated as Hi0 : µiT − µiC ≤ 0 vs Hi1 : µiT − µiC > 0 The parallel gatekeeping test strategy is suitable for this example. The two null hypotheses H10 and H20 for antigen type 16 and 18 constitute the primary family which serves as the gatekeeper for the second family of hypotheses which contains H30 and H40 . Assume that the means and the standard deviations for all four antigen types are as follows: Assume that the total sample size is 20 and one-sided significance level is 0.025. To assess the operating characteristics of the parallel gatekeeping procedures, we first need to open the simulation window for multiple endpoints. To this end, click on the Design menu, choose Two Sample for continuous endpoint and then select Multiple Endpoints from the drop-down list and the following screen will show up. On the top of the above screen, one need to specify the total number of endpoints. The 16.3 Simulate Parallel Gatekeeping Design 275 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures lower part of the above screen is the Simulation Parameters tab which allows one to specify the important design parameters including the nominal type I error rate, total sample size, multiple comparison procedures. Now select Parallel Gatekeeping and choose Bonferroni for the parallel gatekeeping methods. For the last family, select Bonferroni as the multiple testing procedure. Next to the Simulation Parameters tab are two additional tabs: Response Generation Info and Simulation Control Info. We need to specify the mean responses for each endpoint for both treatment and control arm as well as the covariance structure among the endpoints. In addition, we need to specify which family each specific endpoint belongs to in the column with the label Family Rank in the same table for specifying the mean responses. There are two ways of specifying the covariance structure: Covariance Matrix or Correlation Matrix. If the Correlation Matrix option is selected, one needs to input the standard deviation for each endpoint in the same table for specifying the mean responses. There is a simpler way to input the standard deviation for each endpoint if all the endpoints share a common standard deviation. This can be done by checking the box for Common Standard Deviation and specify the value of the common standard deviation in the box to the right hand side. One also need to specify the correlations among the endpoints in the table to the right hand side. Similarly, if all the endpoints have a common correlation, then we can just check the box for Common Correlation and specify the value of the common correlation in the box to the right. For the vaccine example, assume the endpoints share a common mild correlation 0.3. Then the window with completed inputs for generating data looks like the following screen. In the window for Simulation Control Info, we can specify the total number of simulations, refresh frequency, type of random number seed. We can also choose to save the simulation data for more advanced analyses. After finishing specifying all the 276 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 input parameter values, click on the Simulate button on the bottom right of the window to run the simualtions. The progress window will report how many simulations have been completed as seen in the following screen. When all the requested simulations have been completed, click on the Close button at the right bottom of the progress report screen and the preliminary simulation summary will show up in the output preview window where one can see overall power summary and the power summary for the primary family as well as the attained overal FWER etc. To see the detailed output, we need to save the simulation in the workbook by clicking on the icon on the top of the output preview window. A simulation node will be 16.3 Simulate Parallel Gatekeeping Design 277 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures appended in the corresponding workbook in the library as seen in the follow window. Next double click on the simulation node in the library and the detailed outputs will be displayed accordingly. In case of testing multiple endpoints, the power definition is not unique. East provides the overall power summary and the power summary for each specific family. In the 278 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 overall power summary table, the following types of power are provided with the overall FWER: global power, conjunctive power and disjunctive power, which capture the overall performance of this gatekeeping procedure. The definitions of the powers are given below: Overall Power and FWER: Global: probability of declaring significance on any of the endpoints Conjunctive: probability of declaring significance on all of the endpoints for which the treatment arm is truly better than the control arm Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error among all the endpoints Power and FWER for Individual Gatekeeper Families except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family Power and FWER for the Last Gatekeeper Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the last family Marginal Power: probability of declaring significance on the particular endpoint For the vaccine example, we see that the gatekeeping procedure using Bonferroni test for both the primary family and the secondary family provides 94.49% power to detect the difference in at least one of the two antigen types 16 and 18. It provides 52.19% power to detect the differences in both antigen types. Also note that this gatekeeping procedure only provides 89.55% power to detect the response difference in any of the other two antigen types 31 or 45 and only 12.53% to detect both antigen types 31 and 45. The marginal power table displays the probabilities of declaring significance on the 16.3 Simulate Parallel Gatekeeping Design 279 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures Table 16.2: Power Comparisons under Different Correlation Assumptions Correlation 0.3 0.5 0.8 Primary Family Disjunct. Conjunct. 0.9449 0.9324 0.9174 0.5219 0.5344 0.5497 Secondary Family Disjunct. Conjunct. 0.8955 0.8867 0.8855 0.1253 0.1327 0.1413 Overall Power Disjunct. Conjunct. 0.9449 0.9324 0.9174 0.1012 0.1192 0.1402 particular endpoint after multiplicity adjustment. For example, the power of detecting antigen type 16 is 55.22%. If it is of interest to assess the robustness of this procedure with respect to the correlation among the different endpoints, we can go back to the input window to change the correlations and run simulation again. To this end, right click on the Sim1 node in the library and select Edit Simulation from the dropdown list. Next click on the Response Generation Info tab, change the common correlation to 0.5 and click Simulate button. We can repeat this for a common correlation 0.8. The following table summarizes the power comparisons under different correlation assumptions. Note that the disjunctive power decreases as the correlation increases and conjunctive power increases as the correlation increases. There are three available parallel gatekeeping methods: Bonferroni, Truncated Holm and Truncated Hochberg. The multiple comparison procedures applied to the gatekeeper families need to satisfy the so-called separable condition. A multiple comparison procedure is separable if the type I error rate under partial null configuration is strictly less than the nominal level α. Bonferroni is a separable procedure. However, the regular Holm and Hochberg procedure are not separable and can’t be applied directly to the gatekeeper families. The truncated versions obtained by taking the convex combinations of the critical constants for the regular Holm/Hochberg procedure and Bonferroni procedure are separable and more powerful than Bonferroni test. The truncation constant leverages the degree of conservativeness. The larger value of the truncation constant results in more powerful procedure. If the truncation constant is set to be 1, it reduces to the regular Holm or Hochberg test. To see this, let’s simulate the design using the truncated Holm procedure for the primary family and Bonferroni test for the second family for the vaccine example with common correlation 0.3. Table 3 compares the conjunctive power and disjunctive power for each family and the overall ones for different truncation parameter values. As the value of the truncation parameter increases, the conjunctive power for the primary family increases and the disjunctive power remain unchanged. Both the conjunctive power 280 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 16.3: Impact of Truncation Constant in Truncated Holm Procedure on Overal Power and Power for Each Family Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.5219 0.5647 0.5988 0.6327 0.9449 0.9449 0.9449 0.9449 Secondary Family Conjunct. Disjunct. 0.1253 0.1229 0.1212 0.1188 0.8955 0.8872 0.8747 0.84 Overall Power Conjunct. Disjunct. 0.1012 0.1065 0.1108 0.115 0.9449 0.9449 0.9449 0.9449 Table 16.4: Impact of Truncation Constant in Truncated Holm Procedure on Marginal Power Truncation Constant 0 0.25 0.5 0.8 Primary Family Type 16 Type 18 Secondary Family Type 31 Type 45 0.5522 0.5886 0.6183 0.6483 0.127 0.1246 0.1227 0.1203 0.9146 0.921 0.9254 0.9293 0.8938 0.8855 0.8731 0.8385 and disjunctive power for the secondary family decrease as we increase the truncation parameter. The overall conjunctive power also increases but the overall disjunctive power remains the same with the increase of truncation parameter. Table 4 shows the marginal powers of this design for different truncation parameter values. The marginal powers for the two endpoints in the primary family increase. On the other hand, the marginal powers for the two endpoints in the secondary family decrease. Table 5 and Table 6 displays the operating characteristics for truncation Hochberg test with different truncation constant values. Note that both the conjunctive and disjunctive powers for the primary family increase as the truncation parameter increases. However, the power for the secondary family decreases with the larger truncation parameter value. The marginal powers for the primary family and for the secondary family behave similarly. The overall conjunctive and disjunctive powers also increase as we increase the truncation parameter. If all the endpoints belong to one single family, the selected multiple testing procedures for the last family (Bonferroni, Sidak, Weighted Bonferroni, Holm’s step 16.3 Simulate Parallel Gatekeeping Design 281 <<< Contents 16 * Index >>> Multiple Endpoints-Gatekeeping Procedures Table 16.5: Impact of Truncation Constant in Truncated Hochberg Procedure on Overal Power and Power for Each Family Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.5219 0.5652 0.6007 0.6369 0.9449 0.9455 0.9468 0.9491 Secondary Family Conjunct. Disjunct. 0.1253 0.1229 0.1213 0.119 0.8955 0.8877 0.8764 0.8439 Overall Power Conjunct. Disjunct. 0.1012 0.1065 0.1109 0.1152 0.9449 0.9455 0.9468 0.9491 Table 16.6: Impact of Truncation Constant in Truncated Hochberg Procedure on Marginal Power Truncation Constant 0 0.25 0.5 0.8 282 Primary Family Type 16 Type 18 Secondary Family Type 31 Type 45 0.5522 0.5892 0.6203 0.6525 0.127 0.1246 0.1228 0.1205 0.9146 0.9215 0.9273 0.9335 16.3 Simulate Parallel Gatekeeping Design 0.8938 0.886 0.8749 0.8424 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 down, Hochberg’s step up, Hommel’s step up, Fixed Sequence or Fallback) will be applied for multiplicity adjustment. For example, if all the four antigen types in the vaccine example are treated as primary endpoints as indicated by the family rank in the window for Response Generation Info and Hochberg’s step up test is selected for the last family in the window for Simulation Parameters, then the regular Hochberg test will be applied to the four endpoints for multiplicity adjustment. The detailed output window is slightly different in case of single family of endpoints as seen in the following window. 16.3 Simulate Parallel Gatekeeping Design 283 <<< Contents 16 284 * Index >>> Multiple Endpoints-Gatekeeping Procedures 16.3 Simulate Parallel Gatekeeping Design <<< Contents * Index >>> 17 17.1 Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs Design Consider designing a placebo controlled, double blind and randomized trial to evaluate the efficacy, pharmacokinetics, safety and tolerability of a new therapy given as multiple weekly infusions in subjects with a recent acute coronary syndrome. There are four dose regimens to be investigated. The treatment effect is assessed through the change in PAV (percent atheroma volume) from baseline to Day 36 post-randomization, as determined by IVUS (intravascular ultrasound). The expected change in PAV for placebo group and the four dose regimens are: 0, 1,1.1,1.2 and 1.3 and the common standard deviation is 3. The objective of the study is to find the optimal dose regimen based on the totality of the evidence including benefit-risk assessment and cost considerations. To design such a study in EAST, we first need to invoke the design dialog window. To this end, one needs to click on the Design menu on the top of EAST window, select Many Samples for continuous type of response and then select Multiple Looks-Group Sequential in the drop-down list as shown in the following screen shot After selecting the design, we will see a dialog window for the user to specify the main design parameters. On the top of the window, we need to specify the number of arms including the control arm and the number of looks. We also need to specify the 17.1 Design 285 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs nominal significance level, power or sample size, mean response for each arm, standard deviation for each arm and allocation ratio of each arm to control arm. Suppose we would like to compute the sample size to achieve 90% power at one-sided 0.025 significance level. After filling in all the inputs, the design dialog window looks as follows: Now click on the compute button at the bottom right of the window to see the total sample size. Note that we need 519 subjects. Here the power is the probability of successfully detecting significant difference for at least one active treatment group 286 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 compared to control arm. Suppose that now we would like to do a group sequential design with interim looks so that the trial can be terminated earlier if one or more of the treatment groups demonstrate overwhelming efficacy. To do this, we change the number of looks to 3. Note that there is another tab showing up beside the Test Parameter tab. This new tab with label Boundary is to specify efficacy boundary, futility boundary and the spacing of looks. Suppose we want to take two interim looks with equally spacing using O’Brien Fleming spending function from Lan-DeMats 1984. The input window looks 17.1 Design 287 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs like the following One can view the boundaries in terms of other scales including score, δ and p-value scale by clicking the drop-down box for boundary scale. For example, the δ scale boundary for this study is 2.904, 1.486 and 1.026. Now click on the compute button on the bottom right of the window to create the design. Note that the total sample size to achieve 90% power is now 525 compared to 519 for the fixed sample design created earlier. The power definition here is the probability of successfully detecting any active treatment group which is significantly 288 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 different from control group at any look. To view the detailed design output, keep the design in the library and double click the design node. The first table shows the sample size information including the maximum sample size 17.1 Design 289 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs if the trial goes all the way to the end and the sample size per arm. It also shows that the expected sample size under the global null where none of the active treatment group is different from control group and the expected sample size under the design alternative specified by the user. The secondary table displays the look-by-look information including sample size, cumulative type I error, boundaries, boundary crossing probability under the global null and under user-specified design alternative. The boundary crossing probability at each look shows the likelihood of at least one active treatment group crossing the boundary at that particular look. The third table shows the Z scale boundary. One can also add a futility boundary to the design by clicking on the drop-down box for the futility boundary family. There are three families of boundary for futility: Spending Function, p value, δ which can be seen as in the following screen Now click on recalc button to see the cumulative α, efficacy boundary, cumulative β and futility boundary displayed in the boundary table. The futility boundary is non-binding and the details on the computation of futility boundary is provided in Section J.2. The futility boundary is computed such that the probability for the best performed arm (compared to control arm) to cross the futility boundary at any look is equal to the incremental β. For example, the probability for the best performed treatment arm crossing 0.178 is 0.005 under the design alternative. The probability for the trial to stay in the continuous region at the first look but cross the futility boundary 290 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1.647 at second look is 0.04 which is the incremental β spent. Now click on Compute to see the required sample size to achieve 90% power. Note that we need a larger sample size 560 to acheive the same target power with futility boundary compared to the design without futility boundary. However, the expected sample size under H0 with futility boundary is much smaller than the design without 17.1 Design 291 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs futility. One can also build a futility boundary based on δ. For example, one might want to terminate the study if negative δ is observed. It can be seen that such futility boundary is more conservative than the one constructed based on O’Brien-Fleming spending function in the sense that it terminates the trial earlier for futility with smaller 292 17.1 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probability. 17.2 Simulation Multi-arm multi-stage design is complex study design with pros and cons. One of the pros is that it saves subjects compared to conducting separate studies to assess each treatment to control. It may also be advantageous in terms of enrolment. One of the cons is that the hurdle for demonstrating statistical significance is higher due to multiplicity. One needs to evaluate the operating characteristics of such designs through intensive simulations and to assess the pros and cons of using such design. To simulate a MAMS design, select the design node in the library and click on the 17.2 Simulation 293 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs simulation icon located at the top of the library window This will open the simulation dialog window. There are four windows for inputing values for simulation parameters: Test Parameters, Boundary, Response Generation and Simulation Controls. The Test Parameters window provides the total sample size, test statistics and variance type to be used in simulations. The boundary tab has similar inputs as that for design. The default inputs for boundary are carried from the design. One can modify the boundary in the simulation mode without having to go back to design. One can even add a futility boundary. The next screen is Response Generation tab where one needs to specify the underlying mean, standard deviation and allocaton ratio for different treatment arm. The last tab, Simulation Control, allows one to specify the total number of simulations to be run and to save the intermediate simulation data for further analysis. For example, we can run simulation under the 294 17.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design alternative where the mean differences from control are 1,1.1,1.2 and 1.3. After filling in all the inputs, click on the Simulation button on the right bottom of the window. After the simulation is completed, it will show up in the ouput preview area. To view the detailed simulation output, we can save it into the library and double click the simulation node. The first table in the detailed output shows the overall power including global power, conjunctive power, disjunctive power and FWER. The definitions for different powers 17.2 Simulation 295 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs are as follows. Global Power: probability of demonstrating statistical significance on one or more treatment groups Conjunctive Power: probability of demonstrating statistical significance on all treatment groups which are truely effective Disjunctive Power: probability of demonstrating statistical significance on at least one treatment group which is truely effective FWER: probability of incorrectly demonstrating statistical significance on at least one treatment group which is truely ineffective For this example, the global power is about 90% which confirms the design power. The conjunctive power is about 8%. The second table for probability of trial termination at each look displays the average sample size, information fraction, cumulative α spent, bounary information, probability of trial termination at each look. For this example, the chance of terminating the trial at the very first look is less than 3%. The trial has about 55% chance to stop early by the second look. It can be seen that the average sample size for the trial is about 424 which is shown in the last entry of the average sample size column. In MAMS design, when the trial stops for efficacy, there might be one or more treatments crossing the efficacy boundary. Such information is valuable in some situations. For example, when multiple dose options are desired for patients with different demographic characteristics, it might be benificial to approve multiple doses on the product label which will give physicians the options to prescribe the appropriate dose for a specific patient. In this case, we are not only interested in the overal power of the study but also interested in the power of claiming efficacy on more than one dose groups. Such information is summarized in the third table. This table shows the probability of demonstrating significance on specific number of treatments at each look and across all looks. For example, the trial has about 90% overall power. With 39% probability out of 90%, it successully shows significance on only one treatment, 26% probability on two treatments, 17% on three treatments and about 8.5% for all four treatments. It also shows such breakdown look by look. The fourth table summarizes the marginal power for each treatment group look by look and across all looks. For example, the trial has a marginal power of 29% successfully demonstrating efficacy for Treatment 1, 38% for Treatment 2, 49% for Treatment 3 and 60% for Treatment 4. The detailed efficacy outcome table as seen in the following screen provides further efficacy details pertinent to treatment identities. For example, 296 17.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the trial has about 3.77% probability of demonstrating efficacy only on Treatment 1, 1.34% for both treatment 1 and 2, 1.7% for treatment 1, 2 and 3. It has 8.5% probability of showing significance on all four treatments. 17.2.1 Futility Stopping and Dropping the Losers In the simulation mode, the futility boundary can be utilized in two different manners. Futility boundary can be used to terminate the trial earlier if the best performing treatment isn’t doing well. It can also be used to drop arms which are futile along the way and only continue those treatments which are performing well. The two options can be accessed through the two radio buttons below the boundary table as seen in the 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers 297 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs following screen. Suppose that we would like to incorporate a conservative futility boundary so that we will terminate the trial if all δs are negative at any interim look. We would specify the 298 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 futility boundary as in the following screen. Suppose we want to see how often the trial will be terminated early for futility if none of the treatments are effective. Click on the Simulate button on the right bottom of the window to start simulation. The detailed output is shown below. Note that the trial will have about 20% probability of stopping early for futility at the very first look and a little more than 9% chance of stopping for futility at the second look. The average 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers 299 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs sample size is about 437 compared to 523 for the design without futility boundary. Under the design alternative, there is a very small probability (less than 0.5%) to terminate the trial early for futility as seen from the following table. For the big companies, a more agressive futility boundary might be desirable so that trials for treatments with small effect can be terminated early and resources can be deployed to other programs. Suppose that a futility boundary based on δ = 0.5 to be used. Under the global null hypothesis, there is almost 70% chance for the trial to stop early for futility. The average sample size for the study is about 316 compared to 437 300 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for the design with futility based δ of zero. The other use of the futility boundary is to drop those arms which are ineffective along the way. Such design would be more efficient if it is anticipated that there is a strong heterogeneity among different treatment arms. Suppose that two of the four treatment regimens have relative smaller treatment effect. For example, the mean difference from control might be 0.1, 0.1, 1.2,1.3. Without applying any futility, the trial has about 85% and average sample size of 437. If we drop those doses which cross the futility boundary based on δ of 0.5, the trial has about 82% power and average sample size 328. From the table for probability of trial termination at each look, we can see that the trial has about 8% chance stopping early at the first interim look of which a little more than 2% for efficacy and about 5% chance for futility. The trial has 46% chance stopping earlier at the second look with about 45% for efficacy and less than 2% for futility. From the table for additional details of probability of trial termination at each look, we can see that the trial has 2.78% chance stopping for efficacy at the first look of which 2.55% probability the trial demonstrates significance on only one treatment. At the second look, the trial has about 45% probability stopping early for efficacy of which 29% probability it demonstrates significance on one treatment, 15% probability on two treatments and less than 1% probability on three or four treatments. This design has marginal power about 50% to detect significance on Treatment 3 and more than 60% probability on Treatment 4. Treatment 1 and Treatment 2 each has 70% chance being terminated at look 1 for futility. The marginal probability for futility stopping for each treatment counts those simulated trials for which the particular treatment crosses the futility boundary but it doesn’t counts those trials for which the particular treatment 17.2 Simulation – 17.2.1 Futility Stopping and Dropping the Losers 301 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs falls into the continuous region. The second table in the above screen shows the probability of demonstrating significance on specific number of treatments. However it doesn’t provide information on the likelihood of showing efficacy on specific treatment combinations. Such information is provided in the table for detailed efficacy outcomes. For example, the trial has about 20% probability of success with Treatment 3 only, 32% with Treatment 302 17.2 Simulation – 17.2.2 Interim Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4 only, 30% with both Treatment 3 and Treatment 4. 17.2.2 Interim Treatment Selection It might be desirable to select promising dose/treatment groups and drop those ineffective or unsafe groups after reviewing the interim data. In general, there are no analytical approach to evaluate such complex design. EAST provides the option to evaluate such adaptive design through intensive simulations. The treatment selection option can be incorporated by clicking on the icon located on the top bar of the main simulation dialog window. The treatment selection window screen looks as follows. It takes several inputs from the user. The first input is the drop-down box for the user to specify the look position for performing treatment selection. The next input is drop-down box for the treatment effect scale. There is a list of treatment effect scale available as seen in the following screen including Wald Statistic, Estimated Mean, 17.2 Simulation – 17.2.2 Interim Treatment Selection 303 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs Estimated δ etc. EAST provides three different dose/treatment selection rules: (1) Select best r treatment, (2) Select treatments wthin of the best treatment, (3) Select treatments greater than threshold ζ where r, , ζ accept inputs from the user. For the same example, suppose we select two best treatments at the second interim look. The inputs 304 17.2 Simulation – 17.2.2 Interim Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 are as follows: 17.2 Simulation – 17.2.2 Interim Treatment Selection 305 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs Now click on simulation button to run simulations. When the simulation is done, save it into the library and view the detailed output as in the following screen. We can see that the trial has about 85% overall power to detect significance on at least one treatment group with an average sample size of 400 (Overall Powers). It has about 50% probability of stopping early by the second look (Prabability of Trial Termination at Each Look). From the third table (Additional Details of Probability of Trial Termination at Each Look), it can be seen that the trial has about 52% power to show significance on only one treatment and 33% probability on two treatments, less than 1% probability on three or four treatments. Marginally Treatment 3 has 53% chance of 306 17.2 Simulation – 17.2.2 Interim Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 success and Treatment 4 has 66% chance of success. When we select two best treatments, the sample size for the selected two treatments remains the same as the designed one. However we can realloacate the remaining sample size from the dropped groups to the selected arm to gain more power. If the sample size for the dropped arms are reallocated to the selected arms, the efficacy stopping boundary for the remaining looks will have to be recomputed in order to preserve the type I error. This can be achieved by checking the box for Reallocating remaining sample size to selected arm on the Treatment Selection tab as seen in the 17.2 Simulation – 17.2.2 Interim Treatment Selection 307 <<< Contents 17 * Index >>> Continuous Endpoint: Multi-arm Multi-stage (MaMs) Designs following window. The simulation output is shown in the following screen. Note that the power of the study is almost 92% in exchange of a higher average sample size 436 compared to the design without sample size reallocation (85% power and 400 average sample size). Also with sample size reallocation, the study has a higher power 43% of demonstrating significance on both Treatment 3 and Treatment 4 compared to the design without sample size reallocation which has 33% power. 308 17.2 Simulation <<< Contents * Index >>> 18 Two-Stage Multi-arm Designs using p-value combination 18.1 Introduction In the drug development process, identification of promising therapies and inference on selected treatments are usually performed in two or more stages. The procedure we will be discussing here is an adaptive two-stage design that can be used for the situation of multiple treatments to be compared with a control. This will allow integration of both the stages within a single confirmatory trial controlling the multiple level type-I error. After the interim analysis in the first stage, the trial may be terminated early or continued with a second stage, where the set of treatments may be reduced due to lack of efficacy or presence of safety problems with some of the treatments. This procedure in East is highly flexible with respect to stopping rules and selection criteria and also allows re-estimation of the sample size for the second stage. Simulations show that the method may be substantially more powerful than classical one-stage multiple treatment designs with the same total sample size because second stage sample size is focused on evaluating only the promising treatments identified in the first stage. This procedure is available for continuous as well discrete endpoint studies. The current chapter deals with the continuous endpoint studies only; discrete endpoint studies are handled similarly. 18.2 Study Design This section will explore different design options available in East with the help of an example. 18.2.1 Introduction to the Study 18.2.2 Methodology 18.2.3 Study Design Inputs 18.2.4 Simulating under Different Alternatives 18.2.1 Introduction to the Study Consider designing a placebo controlled, double blind, randomized trial to evaluate the efficacy, pharmacokinetics, safety and tolerability of a New Chemical Entity (NCE) given as multiple weekly infusions in subjects with a recent acute coronary syndrome. There are four dose regimens to be investigated. The treatment effect is assessed through the change in PAV (percent atheroma volume) from baseline to Day 36 post-randomization, as determined by IVUS (intravascular ultrasound). The expected change in PAV for placebo group and the four dose regimens are: 0, 1, 1.1, 1.2, 1.3 and the common standard deviation is 3. The objective of the study is to find the optimal dose regimen based on the totality of the evidence including benefit-risk assessment and cost considerations. 18.2.2 Methodology This is a randomized, double-blind, placebo-controlled study conducted in two parts using a 2-stage adaptive design. In Stage 1, approximately 250 eligible subjects will be randomized equally to one of four treatment arms (NCE [doses: 1, 2.5, 5 or 10 mg]) and matching placebo (which is 50 subjects/dose group) After all subjects in Stage 1 18.2 Study Design – 18.2.2 Methodology 309 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination have completed treatment period or discontinued earlier, an interim analysis will be conducted to 1. compare the means each dose group 2. assess safety within each dose group and 3. drop the less efficient doses Based on the interim analysis, Stage 2 of the study will either continue with additional subjects enrolling into 1/2/3 arms (placebo and 1/2/3 favorable, active doses) or the study will be halted completely if unacceptable toxicity has been observed. In this example, we will have the following workflow to cover different options available in East: 1. Start with four arms (4 doses + Placebo) 2. Evaluate the four doses at the interim analysis and based on the Treatment Selection Rules carry forward some of the doses to the next stage 3. While we select the doses, also increase the sample size of the trial by using Sample Size Re-estimation (SSR) tool to improve conditional power if necessary In a real trial, both the above actions (early stopping as well as sample size re-estimation) will be performed after observing the interim data. 4. See the final design output in terms of different powers, probabilities of selecting particular dose combinations 5. See the early stopping boundaries for efficacy and futility on adjusted p-value scale 6. Monitor the actual trial using the Interim Monitoring tool in East. Start East. Click Design tab, then click Many Samples in the Continuous category, and then click Multiple Looks- Combining p-values test. 310 18.2 Study Design – 18.2.2 Methodology <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will bring up the input window of the design with some default values. Enter the inputs as discussed below. 18.2.3 Study Design Inputs The four doses of the treatment- 1mg, 2.5mg, 5mg, 10mg will be compared with the Placebo arm based on their treatment means. Preliminary sample size estimates are provided to achieve an overall study power of at least 90% at an overall, adequately adjusted 1-sided type-1 or alpha level of 2.5%, after taking into account all interim and final hypothesis tests. Note that we always use 1-sided alpha since dose-selection rules are usually 1-sided. In Stage 1, 250 subjects are initially planned for enrollment (5 arms with 50 subjects each). Following an interim analysis conducted after all subjects in Stage 1 have completed treatment period or discontinued earlier, an additional 225 subjects will be enrolled into three doses for Stage 2 (placebo and two active doses). So we start with the total of 250+225 = 475 subjects. The multiplicity adjustment methods available in East to compute the adjusted p-value (p-value corresponding to global NULL) are Bonferroni, Sidak, Simes. For discrete endpoint test, Dunnett Single Step is not available since we will be using Z-statistic. Let us use the Bonferroni method for this example. The p-values obtained from both the stages can be combined by using the “Inverse Normal” method. In the “Inverse 18.2 Study Design – 18.2.3 Study Design Inputs 311 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination Normal” method, East first computes the weights as follows: r n(1) (1) w = n And r w (2) = n(2) n (18.1) (18.2) where n(1) and n(2) are the total sample sizes corresponding to Stage 1 and stage 2 respectively and n is the total sample size. EAST displays these weights by default but they are editable and user can specify any other weights as long as 2 2 w(1) + w(2) = 1 (18.3) Final p-value is given by p = 1 − Φ w(1) Φ−1 (1 − p(1) ) + w(2) Φ−1 (1 − p(2) ) (18.4) The weights specified on this tab will be used for p-value computation. w(1) will be used for data before interim look and w(2) will be used for data after interim look. Thus, according to the samples p sizes planned pfor the two stages in this example, the weights are calculated as (250/475) and (225/475). Note : These weights are updated by East once we specify the first look position as 250/475 in the Boundary tab. So leave these as default values for now. Set the Number of Arms as 5 and enter the rest of the inputs as shown below: We can certainly have early stopping boundaries for efficacy and/or futility. But generally, in designs like this, the objective is to select the best dose(s) and not stop early. So for now, select the Boundary tab and set both the boundary families to “None”. Also, set the timing of the interim analysis as 0.526 which will be after 312 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 observing the data on 250 subjects out of 475. Enter 250/475 as shown below. Notice the updated weights on the bf Test Parameters tab. The next tab is Response Generation which is used to specify the true underlying means on the individual dose groups and the initial allocation from which to generate the simulated data. One can also generate the mean response for all the arms using a dose-response curve like 4PL or Emax or Linear or Quadratic. It can be done by checking the box for Generate Means through DR Curve and entering appropriate parameters for DR 18.2 Study Design – 18.2.3 Study Design Inputs 313 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination model selected. For this example, we will use the given means and standard deviation and not generate them using a DR curve. Make sure the means are 0, 1, 1.1, 1.2, 1.3 and SD is 3. Before we update the Treatment Selection tab, go to the Simulation Control Parameters tab where we can specify the number of simulations to run, the random number seed and also to save the intermediate simulation data. For now, enter the inputs as shown below and keep all other inputs as default. Click on the Treatment Selection tab. This tab is to select the scale to compute the treatment-wise effects. For selecting treatments for the second stage, the treatment effect scale will be required, but the control treatment will not be considered for selection. It will always be there in the second stage. The list under Treatment Effect Scale allows you to set the selection rules on different scales. Select Estimated δ from this list. It means that all the selection rules we specify on this tab will be in terms of the estimated value of treatment effect, δ, i.e., difference from 314 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 placebo. Here is a list of all available treatment effect scales: Estimated Mean, Estimated δ, Estimated δ/σ, Test Statistic, Conditional Power, Isotonic Mean, Isotonic δ, Isotonic δ/σ. For more details on these scales, refer to the Appendix K chapter on this method. The next step is to set the treatment selection rules for the second stage. Select Best r Treatments: The best treatment is defined as the treatment having the highest or lowest mean effect. The decision is based on the rejection region. If it is “Right-Tail” then the highest should be taken as best. If it is “Left-Tail” then the lowest is taken as best. Note that the rejection region does not affect the choice of treatment based on conditional power. Select treatments within of Best Treatment: Suppose the treatment effect scale is Estimated δ. If the best treatment has a treatment effect of δb and is specified as 0.1 then all the treatments which have a δ as δb − 0.1 or more are chosen for Stage 2. Select treatments greater than threshold ζ: The treatments which have the treatment effect scale greater or less than the threshold (ζ) specified by the user according to the rejection region. But if the treatment effect scale is chosen as the conditional power then it will be greater than all the time. Use R for Treatment Selection: If you wish to define any customized treatment selection rules, it can be done by writing an R function for those rules to be used within East. This is possible due to the R Integration feature in East. Refer to the appendix chapter on R Functions for more details on syntax and use of this feature. A template file for defining treatment selection rules is also available in the subfolder RSamples under your East installation directory. For more details on using R to define Treatment selection rules, refer to section O.10. 18.2 Study Design – 18.2.3 Study Design Inputs 315 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination Selecting multiple doses (arms) for Stage 2 would be more effective than selecting just the best one. For this example, select the first rule Select Best r treatments and set r = 2 which indicates that East will select the best two doses for Stage 2 out the four. We will leave the Allocation Ratio after Selection as 1 to yield equal allocation between the control and selected doses in Stage 2. Click the Simulate button to run the simulations. When the simulations are over, a row gets added in the Output Preview area. Save this row to the Library by clicking the icon in the toolbar. Rename this scenario as Best2. Double click it to see the detailed output. The first table in the detailed output shows the overall power including global power, conjunctive power, disjunctive power and FWER. The definitions for different powers are as follows: Global Power: probability of demonstrating statistical significance on one or 316 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 more treatment groups Conjunctive Power: probability of demonstrating statistical significance on all treatment groups which are truly effective Disjunctive Power: probability of demonstrating statistical significance on at least one treatment group which is truly effective FWER: probability of incorrectly demonstrating statistical significance on at least one treatment group which is truly ineffective For our example, there is 88% global power which is the probability of this design to reject any null hypothesis, where the set of null hypothesis are the TRUE proportion of responders at each dose equals that of control. Also shown is conjunctive and disjunctive power, as well as Family Wise Error Rate (FWER). The Lookwise Summary table summarizes the number of simulated trials that ended with a conclusion of efficacy, i.e., rejected any null hypothesis, at each look. In this example, no simulated trial stopped at the interim analysis with an efficacy conclusion since there were no stopping boundaries, but 8845 simulations yielded an efficacy conclusion via the selected dose after Stage 2. This is consistent with the global power. The table Detailed Efficacy Outcomes for all 10000 Simulations summarizes the number of simulations for which each individual dose group or pairs of doses were selected for Stage 2 and yielded an efficacy conclusion. For example, the pair (2.5mg, 10mg only) was observed to be efficacious in approximately 16% of the trials (1576/10000). The next table Marginal Probabilities of Selection and Efficacy, summarizes the number and percent of simulations in which each dose was selected for Stage 2, regardless of whether it was found significant at end of Stage 2 or not, as well as the number and percent of simulations in which each dose was selected and found significant. Average sample size is also shown. It tells us how frequently the dose (either alone or with some other dose) was selected and efficacious. For example, dose 10mg was selected in approximately 65% trials and was efficacious in approximately 56% trials. (which is the sum of 631, 1144, 1576, 2254 simulations from previous table.) The advantage of 2-stage “treatment selection design” or “drop-the-loser” design is that it allows to drop the less performing/futile arms based on the interim data and still preserves the type-1 error as well as achieve the desired power. In the Best2 scenario, we dropped two doses (r = 2). Suppose, we had decided to proceed to stage 2 without dropping any doses. In this case, Power would have 18.2 Study Design – 18.2.3 Study Design Inputs 317 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination dropped significantly. To verify this in East, click the button on the bottom left corner of the screen. This will take us back to the input window of the last simulation scenario. Go to Treatment Selection tab and set r = 4 and save it to Library. Rename this scenario as All4. Double click it to see the detailed output. We can observe that the power drops from 88% to 78%. That is because the sample size of 225 is being shared among five arms as against three arms in the Best2 case. Now go back to Treatment Selection tab, set r = 2 as before. Select one more rule, Select Treatments within of Best Treatment and set the value as 0.05. The tab should look as shown below. Also set the Starting Seed on Simulation Controls tab to 100. Note that since we have selected two treatment selection rules, East will simulate two different scenarios, one for each rule. As we want to compare the results from these two scenarios, we use the same starting seed. That will ensure same random number generation and the only difference in results will be the effect of the two rules. Save these two scenarios in the Library as r=2 and epsilon=0.05, select them and click the 318 icon in the toolbar to see them side-by-side. 18.2 Study Design – 18.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Notice the powers for the two scenarios. The scenario with the rule of δb − 0.05 yields more power than the Best2 Scenario. Note that δb is the highest value among the simulated of δ values for the four doses at the interim look. You can also view the Output Details of these two scenarios. Select the two nodes as 18.2 Study Design – 18.2.3 Study Design Inputs 319 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination before but this time, click the icon in the toolbar. Notice from this comparison, due to a more general rule based on , we can select multiple doses and not just two. At the same time, the marginal probability of selection as well as efficacy for each dose drops significantly. 18.2.4 Simulating under Different Alternatives Since this is a simulation based design, we can perform sensitivity analyses by changing some of the inputs and observing effects on the overall power and other output. Let us first make sure that this design preserves the total type1 error. It can be done by running the simulations under “Null” hypothesis. Select the last design created which would be epsilon = 0.05 in the Library and click the icon. This will take you to the input window of that design. Go to Response Generation tab and enter the inputs as shown below. Notice that all the means are 0 320 18.2 Study Design – 18.2.4 Simulating under Different Alternatives <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 which means the simulations will be run under NULL assumption. Run the simulations and go to the detailed output by saving the row from Output Preview to the Library. Notice the global power and the simulated FWER is less than 0.025 which means the overall type1 error is preserved. 18.3 Sample Size Reestimation As seen in the previous scenario, the desired power of approximately 92% is achieved with the sample size of 475 if the initial assumptions (µc = 0, µ1mg = 1, µ2.5mg = 1.1, µ5mg = 1.2 and µ10mg = 1.3) hold true. But if they do not, then the original sample size of 475 may be insufficient to achieve 92% power. The adaptive sample size re-estimation is suited to this purpose. In this approach we start out with a sample size of 475 subjects, but take an interim look after data are available on 250 subjects. The purpose of the interim look is not to stop the trial early but rather to examine the interim data and continue enrolling past the planned 475 subjects if the interim results are promising enough to warrant the additional investment of sample size. This strategy has the advantage that the sample size is finalized only after a thorough examination of data from the actual study rather than through making a large up-front sample size commitment before any data are available. Furthermore, if the sample size may only be increased but never decreased from the originally planned 475 subjects, there is no loss of efficiency due to overruns. Suppose the mean responses on the five doses are as shown below. Update the Response Generation tab 18.3 Sample Size Re-estimation 321 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination accordingly and also set the seed as 100 in the Simulation Controls tab. Run 10000 simulations and save the simulation row to the Library by clicking the 322 18.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon in the toolbar. See the details. Notice that the global power has dropped from 92% to 78%. Let us re-estimate the sample size to achieve the desired power. Add the Sample Size Re-estimation tab by clicking the button . A new tab gets added as shown below. SSR At: For a K-look group sequential design, one can decide the time at which conditions for adaptations are to be checked and actual adaptation is to be 18.3 Sample Size Re-estimation 323 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination carried out. This can be done either at some intermediate look or after some specified information fraction. The possible values of this parameter depend upon the user choice. The default choice for this design is always the Look #. and is fixed to 1 since it is always a 2-look design. Target CP for Re-estimating Sample Size: The primary driver for increasing the sample size at the interim look is the desired (or target) conditional power or probability of obtaining a positive outcome at the end of the trial, given the data already observed. For this example we have set the conditional power at the end of the trial to be 92%. East then computes the sample size that would be required to achieve this conditional power. Maximum Sample Size if Adapt (multiplier; total): As just stated, a new sample size is computed at the interim analysis on the basis of the observed data so as to achieve some target conditional power. However the sample size so obtained will be overruled unless it falls between pre-specified minimum and maximum values. For this example, let us use the multiplier as 2 indicating that we intend to double the original sample size if the results are promising. The range of allowable sample sizes is [475, 950]. If the newly computed sample size falls outside this range, it will be reset to the appropriate boundary of the range. For example, if the sample size needed to achieve the desired 90% conditional power is less than 475, the new sample size will be reset to 475. In other words we will not decrease the sample size from what was specified initially. On the other hand, the upper bound of 950 subjects demonstrates that the sponsor is prepared to double the sample size in order to achieve the desired 90% conditional power. But if 90% conditional power requires more than 950 subjects, the sample size will be reset to 950, the maximum allowed. Promising Zone Scale: One can define the promising zone as an interval based on conditional power, test statistic, or estimated δ/σ. The input fields change according to this choice. The decision of altering the sample size is taken based on whether the interim value of conditional power / test statistic / δ/σ lies in this interval or not. Let us keep the default scale which is Conditional Power. Promising Zone: Minimum/Maximum Conditional Power (CP): The sample size will only be altered if the estimate of CP at the interim analysis lies in a pre-specified range, referred to as the “Promising Zone”. Here the promising zone is 0.30 − 0.90. The idea is to invest in the trial in stages. Prior to the interim analysis the sponsor is only committed to a sample size of 475 subjects. If, however, the results at the interim analysis appear reasonably promising, the 324 18.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 sponsor would be willing to make a larger investment in the trial and thereby improve the chances of success. Here we have somewhat arbitrarily set the lower bound for a promising interim outcome to be CP = 0.30. An estimate CP < 0.30 at the interim analysis is not considered promising enough to warrant a sample size increase. It might sometimes be desirable to also specify an upper bound beyond which no sample size change will be made. Here we have set that upper bound of the promising zone at CP = 0.90. In effect we have partitioned the range of possible values for conditional power at the interim analysis into three zones; unfavorable (CP < 0.3), promising (0.3 ≤ CP < 0.9), and favorable (CP ≥ 0.9). Sample size adaptations are made only if the interim CP falls in the promising zone at the interim analysis. The promising zone defined on the Test Statistic scale or the Estimated δ/σ scale works similarly. SSR Function in Promising Zone: The behavior in the promising zone can either be defined by a continuous function or a step function. The default is continuous where East accepts the two quantities - (Multiplier, Target CP) and re-estimates the sample size depending upon the interim value of CP/test statistic/effect size. The SSR function can be defined as a step-function as well. This can be done with a single piece or with multiple pieces. For each piece, define the step function in terms of: the interval of CP/test statistic/δ/σ. This depends upon the choice of promising zone scale. the value of re-estimated sample size in that interval. for single piece, just the total re-estimated sample size is required as an input. If the interim value of CP/ test statistic/δ/σ lies in the promising zone then the re-estimation will be done using this step function. Let us set the inputs on Sample Size Re-estimation tab as shown below. Just for the comparison purpose, also run the simulations without adaptation. Both the scenarios 18.3 Sample Size Re-estimation 325 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination can also be run together by entering two values 1, 2 in the cell for Multiplier. Run 10000 simulations and see the Details. With Sample Size Re-estimation 326 18.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Without Sample Size Re-estimation We observe from the table the power of adaptive implementation is approximately 85% which is almost 8% improvement over the non-adaptive design. This increase in power has come at an average cost of 540-475 = 65 additional subjects. Next we observe from the Zone-wise Averages table that 1610 of 10000 trials (16%) underwent sample size re-estimation (Total Simulation Count in the “Promising Zone”) and of those 1610 trials, 89% were able to reject the Global null hypothesis. The average sample size, conditional on adaptation is 882. 18.4 Adding Early Stopping Boundaries One can also incorporate stopping boundaries to stop at the interim early for efficacy or futility. The efficacy boundary can be defined based on Adjusted p-value scale whereas futility boundary can be on Adjusted p-value or δ/σ scale. Click the button on the bottom left corner of the screen. This will take you back to the input window of the last simulation scenario. Go to Boundary tab and set Efficacy and Futility boundaries to “Adjusted p-value”. These boundaries are for 18.4 Adding Early Stopping Boundaries 327 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination early stopping at look1. As the note on this tab says: If any one adjusted p-value is ≤ efficacy p-value boundary then stop the trial for efficacy If only all the adjusted p-values are > futility p-value then stop the trial for futility. Else carry forward all the treatments to the next step of treatment selection. Stopping early for efficacy or futility is step which is carried out before applying the treatment selection rules. The simulation output has the same explanation as above except the Lookwise Summary table may have some trials stopped at the first look due to efficacy or futility. 18.5 Interim Monitoring with Treatment Selection Select the simulation node with SSR implementation and click the icon. It will invoke the Interim Monitoring dashboard. Click the icon to open the Test Statistic Calculator. The “Sample Size” column is filled out according to the originally planned design (50/arm). Enter the data as shown below: Click Recalc to calculate the test statistic as well as the raw p-values. Notice that the p-values for 1mg and 2.5mg are 0.069 and0.030 respectively which are greater than 0.025. We will drop these doses in the second stage. On clicking OK, it updates the dashboard. The overall adjusted p-value is 0.067. 328 18.5 Interim Monitoring with Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Open the test statistic calculator for the second look and enter the following information and also drop the two doses 1mg and 2.5mg using the dropdown of 18.5 Interim Monitoring with Treatment Selection 329 <<< Contents 18 * Index >>> Two-Stage Multi-arm Designs using p-value combination “Action”. Click Recalc to calculate the test statistic as well as the raw p-values. On clicking OK, it updates the dashboard. Observe that the adjusted p-value for 10mg crosses the efficacy boundary. It can also be observed in the Stopping Boundaries chart. 330 18.5 Interim Monitoring with Treatment Selection <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The final p-value adjusted for multiple treatments is 0.00353. 18.5 Interim Monitoring with Treatment Selection 331 <<< Contents * Index >>> 19 Normal Superiority Regression Linear regression models are used to examine the relationship between a response variable and one or more explanatory variables assuming that the relationship is linear. In this chapter, we discuss the design of three types of linear regression models. In Section 19.1, we examine the problem of testing a single slope in a simple linear regression model involving one continuous covariate. In Section 19.2, we examine the problem of testing the equality of two slopes in a linear regression model with only one observation per subject. Finally, in Section 19.3, we examine the problem of testing the equality of two slopes in a linear regression repeated measures model, applied to a longitudinal setting. 19.1 Linear Regression, Single Slope 19.1.1 Trial Design We assume that the observed value of a response variable Y is a linear function of an explanatory variable X plus random noise. For each of the i = 1, . . . , n subjects in a study Yi = γ + θ Xi + i Here the i are independent normal random variables with E(i ) = 0 and V ar(i ) = σ2 . We follow Dupont et al. (1998) and emphasize a slight distinction between observational and experimental studies. In an observational study, the values Xi are attributes of randomly chosen subjects and their possible values are not known to the investigator at the time of a study design. In an experimental study, a subject is randomly assigned (with possibly different probabilities) to one of the predefined experimental conditions. Each of these conditions is characterized by a certain value of explanatory variable X that is completely defined at the time of the study design. In both cases the value Xi characterizing either an attribute or experimental exposure of subject i is a random variable with a variance σx2 . We are interested in testing that the slope θ is equal to a specified value θ0 . Thus we test the null hypothesis H0 : θ = θ0 against the two-sided alternative H1 : θ 6= θ0 or a one-sided alternative hypothesis H1 : θ < θ0 or H1 : θ > θ0 . Let θ̂ denote the estimate of θ, and let σ̂2 and σ̂x2 denote the estimates of σ2 and σx2 based on n observations. The variance of θ̂ is σ2 = σ2 . nσx2 (19.1) The test statistic is defined as Z = (θ̂ − θ0 )/σ̂, 332 19.1 Linear Regression, Single Slope (19.2) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 where σ̂ 2 = σ̂2 nσ̂x2 is the estimate of the variance of θ̂ based on n observations. Notice that the test statistic is centered so as to have a mean of zero under the null hypothesis. We want to design the study so the power is attained when θ = θ1 . The power depends on θ0 , θ1 , σx , and σ through θ0 − θ1 and σx /σ . 19.1.1 Trial Design During the development of medications, we often want to model the dose-response relationship, which may be done by estimating the slope of the regression, where Y is the appropriate response variable and the explanatory variable X is a set of specified doses. Consider a clinical trial involving four doses of a medication. The doses and randomization of subjects across the doses have been chosen so that the standard deviation σx = 9. Based on prior studies, it is assumed that σ = 15. If there is no dose response, the slope is equal to 0. Thus we will test the null hypothesis H0 : θ = 0 against a two-sided alternative H1 : θ 6= 0. The study is to be designed to have 90% power at the alternative θ1 = 0.5 with a type-1 error rate of 5%. Start East afresh. Click Continuous: Regression on the Design tab and then click Single-Arm Design: Linear Regression - Single Slope. This will launch a new input window. Select the 2-Sided for Test Type. Enter 0.05 and 0.9 for Type I Error (α) and Power, respectively. Enter the values of θ0 = 0, 19.1 Linear Regression, Single Slope – 19.1.1 Trial Design 333 <<< Contents 19 * Index >>> Normal Superiority Regression θ1 = 0.5, σx = 9, and σ = 15. Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (119 subjects) is highlighted in yellow. Des 1 requires 119 subjects in order to attain 90% power. Select this design by clicking anywhere along the row in the Output Preview and click 334 19.1 Linear Regression, Single Slope – 19.1.1 Trial Design . Some of the design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details will be displayed in the upper pane, labeled as Output Summary. In the Output Preview toolbar, click to save this design to Wbk1 in the Library. Now double-click on Des 1 in Library. You will see a summary of the design. 19.2 Linear Regression for Comparing Two Slopes 19.2.1 Trial Design In some experimental situations, we are interested in comparing the slopes of two regression lines. The regression model relates the response variable Y to the explanatory variable X using the model Yil = γ + θi Xil + il , where the error il has a normal distribution with mean zero and an unknown variance σ2 for Subject l in 2 2 Treatment i, i = c, t and l = 1, . . . , ni . Let σxc and σxt denote the variance of the explanatory variable X for control (c) and treatment (t), respectively. We are interested in testing the equality of the slopes θc and θt . Thus we test the null hypothesis 19.2 Linear Regression for Comparing Two Slopes 335 <<< Contents 19 * Index >>> Normal Superiority Regression H0 : θc = θt against the two-sided alternative H1 : θc 6= θt or a one-sided alternative hypothesis H1 : θc < θt or H1 : θc > θt . 2 2 , denote the , and σ̂xt Let θ̂c and θ̂t denote the estimates of θc and θt , and let σ̂2 , σ̂xc 2 2 2 estimates of σ , σxc , and σxt , based on nc and nt observations, respectively. The variance of θ̂i is σ2 σi2 = 2 . ni σxi Let n = nc + nt and let r = nt /n. Then, the test statistic is n1/2 (θ̂t − θ̂c ) Zj = σ̂ 19.2.1 1 2 (1−r)σ̂xc + 1 2 rσ̂xt 1/2 . (19.3) Trial Design We want to design the study so the power is attained for specified values of θc and θt . The power depends on θt , θc , σxc , σxt2 , and σ through θt − θc , σxc /σ , and σxt /σ . Suppose that a medication was found to have a response that depends on the level of a certain laboratory parameter. It was decided to develop a new formulation for which this interaction is decreased. The explanatory variable is the baseline value of the laboratory parameter. The study is designed with θt = 0.5, θc = 1, σxc = σxt = 6, and σ = 10. We examine the slopes of the two regressions by testing the null hypothesis H0 : θt = θc . Although we hope to decrease the slope, we test the null hypothesis against the two-sided alternative H1 : θt 6= θc . Start East afresh. Click Continuous: Regression on the Design tab and then click Parallel Design: Linear Regression - Difference of Slopes. This will launch a new input window. Select 2-Sided for Test Type. Enter 0.05 and 0.9 for Type I Error (α) and Power, respectively. Select Individual Slopes for Input Method, and enter the values of θc = 1, θt = 0.5, σxc = 6, σxt = 6, and 336 19.2 Linear Regression for Comparing Two Slopes – 19.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 σ = 10. Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (469) is highlighted in yellow. Des 1 requires 469 subjects in order to attain 90% power. Select this design by clicking anywhere along the row in the Output Preview and click . Some of the design 19.2 Linear Regression for Comparing Two Slopes – 19.2.1 Trial Design 337 <<< Contents 19 * Index >>> Normal Superiority Regression details will be displayed in the upper pane, labeled as Output Summary. 19.3 Repeated Measures for Comparing Two Slopes In many clinical trials, each subject is randomized to one of two groups, and responses are collected at various timepoints on the same individual over the course of the trial. In these “longitudinal” trials, we are interested in testing the equality of slopes, or mean response changes per unit time, between the treatment group (t) and the control group (c). A major difficulty associated with designing such studies is the fact that the data are independent across individuals, but the repeated measurements on the same individual are correlated. The sample size computations then depend on within – and between – subject variance components that are often unknown at the design stage. One way to tackle this problem is to use prior estimates of these variance components (also known as nuisance parameters) from other studies, or from pilot data. Suppose each patient is randomized to either group c or group t. The data consist of a series of repeated measurements on the response variable for each patient over time. Let M denote the total number of measurements, inclusive of the initial baseline measurement, intended to be taken on each subject. These M measurements will be taken at times vm , m = 1, 2, . . . M , relative to the time of randomization, where v1 = 0. A linear mixed effects model is usually adopted for analyzing such data. Let Yilm denote the response of subject l, belonging to group i, at time point vm . Then the model asserts that Yclm = γc + θc vm + al + bl vm + elm (19.4) for the control group, and Ytlm = γt + θt vm + al + bl vm + elm 338 19.3 Repeated Measures (19.5) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for the treatment group, where the random effect (al , bl )0 is multivariate normal with mean (0, 0)0 and variance-covariance matrix G= σa2 σab σab σb2 , 2 2 denotes the “within – subject” ). In this model, σw and the elm ’s are all iid N (0, σw variability, attributable to repeated measurements on the same subject, while G denotes the “between – subjects” variability, attributable to the heterogeneity of the population being studied. Define δ = θt − θc We are interested in testing H0 : δ = 0 against the two-sided alternative H1 : δ 6= 0 or against one-sided alternative hypotheses of the form H1 : δ > 0 or H1 : δ < 0 Let (θ̂C , θ̂T ) be the maximum likelihood estimates of (θC , θT ), based on a enrollment of (nC , nT ), respectively. The estimate of the difference of slopes is δ̂ = θ̂T − θ̂C (19.6) and its standard error is denoted by se(δ̂). The test statistic is the familiar Wald statistic Z= 19.3.1 δ̂ se(δ̂) (19.7) Trial Design Consider a trial to compare an analgesic to placebo in the treatment of chronic pain using a 10 cm visual analogue scale (VAS). Measurements are taken on each subject at baseline and once a month for six months. Thus M = 7 and S = 6. It is assumed from past data that σw = 4 and σb = 6. We wish to test the null hypothesis H0 : θt = θc 19.3 Repeated Measures – 19.3.1 Trial Design 339 <<< Contents 19 * Index >>> Normal Superiority Regression with a two-sided level-0.05 test having 90% power to detect a 1 cm/month decline in slope, with θc = 2 and θt = 1 under H1 . Start East afresh. Click Continuous: Regression on the Design tab, and then click Parallel Design: Repeated Measures - Difference of Slopes. This will launch a new input window. Select 2-Sided for Test Type. Enter 0.05 and 0.9 for Type I Error (α) and Power, respectively. Select Individual Slopes for Input Method. Enter the values of θc = 2, θt = 1, Duration of Follow up (S) = 6, Number of Measurements (M) = 7, σw = 4, and σb = 6. Click Compute. This will calculate the sample size for this design and the output is shown as a row in the Output Preview located in the lower pane of this window. The 340 19.3 Repeated Measures – 19.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 computed sample size (1538) is highlighted in yellow. Des 1 requires 1538 completers in order to attain 90% power. Select this design by . Some of clicking anywhere along the row in the Output Preview and click the design details will be displayed in the upper pane, labeled as Output Summary. 19.3 Repeated Measures – 19.3.1 Trial Design 341 <<< Contents * Index >>> Volume 3 Binomial and Categorical Endpoints 20 Introduction to Volume 3 344 21 Tutorial: Binomial Endpoint 350 22 Binomial Superiority One-Sample 363 23 Binomial Superiority Two-Sample 394 24 Binomial Non-Inferiority Two-Sample 474 25 Binomial Equivalence Two-Sample 26 Binomial Superiority n-Sample 535 549 27 Multiple Comparison Procedures for Discrete Data 577 28 Multiple Endpoints-Gatekeeping Procedures for Discrete Data 601 29 Two-Stage Multi-arm Designs using p-value combination 30 Binomial Superiority Regression 31 Agreement 649 644 621 <<< Contents * Index >>> 32 Dose Escalation 658 343 <<< Contents * Index >>> 20 Introduction to Volume 3 This volume describes the procedures for discrete endpoints (binomial) applicable to one-sample, two-samples, many-samples, regression and agreement situations. All the three type of designs - superiority, non-inferiority and equivalence are discussed in detail. Chapter 21 introduces you to East on the Architect platform, using an example clinical trial to test difference of proportions. Chapter 22 deals with the design and interim monitoring of two types of tests involving binomial response rates that can be described as superiority one sample situation. Section 22.1 discusses designs in which an observed binomial response rate is compared to a fixed response rate, possibly derived from historical data. Section 22.2 deals with McNemar’s test for comparing matched pairs of binomial responses. Chapter 38 discusses in detail the Simon’s Two stage design. Chapter 23 discusses the superiority two-sample situation where the aim is to compare independent samples from two populations in terms of the proportion of sampling units presenting a given trait. East supports the design and interim monitoring of clinical trials in which this comparison is based on the difference of proportions, the ratio of proportions, or the odds ratio of the two populations, common odds ratio of the two populations. The four cases are discussed in Sections 23.1, 23.2, 23.3 and 23.4, respectively. Section 23.5 discusses the Fisher’s exact test for single look design. Chapter 24 presents an account of designing and monitoring non-inferiority trials in which the non-inferiority margin is expressed as either a difference, a ratio, or an odds ratio of two binomial proportions. The difference is examined in Section 24.1. This is followed by two formulations for the ratio: the Wald formulation in Section 24.2 and the Farrington-Manning formulation in Section 24.3. The odds ratio formulation is presented in Section 24.4. Chapter 25 narrates the details of the design and interim monitoring in equivalence two-sample situation where the goal is neither establishing superiority nor non-inferiority, but equivalence. Examples of this include showing that an aggressive therapy yields a similar rate of a specified adverse event to the established control, such as the bleeding rates associated with thrombolytic therapy or cardiac outcomes with a new stent. Chapter 26 details the design and interim monitoring superiority k-sample experimental situations where there are several binomial distributions indexed by an 344 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ordinal variable and where it is required to examine changes in the probabilities of success as the levels of the indexing variable changes. Examples of this include the examination of a dose-related presence of a response or a particular side effect, dose-related tumorgenicity, or presence of fetal malformations relative to levels of maternal exposure to a particular toxin, such as alcohol, tobacco, or environmental factors. Chapter 27 details the Multiple Comparison Procedures (MCP) for discrete data. It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Multiple comparison (MC) procedures provide a guard against inflation of type I error while testing these multiple hypotheses. East supports several parametric and p-value based MC procedures. This chapter explains how to design a study using a chosen MC procedure that strongly maintains FWER. Chapter 30 describes how East may be used to design and monitor two-arm randomized clinical trials with a binomial endpoints, while adjusting for the effects of covariates through the logistic regression model. These methods are limited to binary and categorical covariates only. A more general approach, not limited to categorical covariates, is to base the design on statistical information rather than sample size. This approach is further explained in Chapter 59 Chapter 31 discusses the tests available to check the inter-rater reliability. In some experimental situations, to check inter-rater reliability, independent sets of measurements are taken by more than one rater and the responses are checked for agreement. For a binary response, Cohen’s Kappa test for binary ratings can be used to check inter-rater reliability. Chapter 32 deals with the design, simulation, and interim monitoring of Phase 1 dose escalation trials. One of the primary goals of Phase I trials in oncology is to find the maximum tolerated dose (MTD). Sections 32.1, 32.2, 32.3 and 32.4 discusses the four commonly used dose escalation methods - 3+3, Continual Reassessment Method (CRM), modified Toxicity Probability Interval (mTPI) and Bayesian Logistic Regression Model (BLRM). 345 <<< Contents 20 20.1 * Index >>> Introduction to Volume 3 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 346 20.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 20.1 Settings 347 <<< Contents 20 * Index >>> Introduction to Volume 3 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 348 20.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 20.1 Settings 349 <<< Contents * Index >>> 21 Tutorial: Binomial Endpoint This tutorial introduces you to East on the Architect platform, using an example clinical trial to test difference of proportions. 21.1 Fixed Sample Design When you open East, you will see the following screen below. By default, the Design tab in the ribbon will be active. The items on this tab are grouped under the following categories of endpoints: Continuous, Discrete, Count, Survival, and General. Click Discrete: Two Samples, and then Parallel Design: Difference of Proportions. 350 21.1 Fixed Sample Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The following input window will appear. By default, the radio button for Sample Size (n) is selected, indicating that it is the variable to be computed. The default values shown for Type I Error and Power are 0.025 and 0.9. Keep the same for this design. Since the default inputs provide all of the necessary input information, you are ready to compute sample size by clicking the Compute button. The calculated result will appear in the Output Preview pane, as shown below. This single row of output contains relevant details of inputs and the computed result of total sample size (and total completers) of 45. Select this row and save it in the Library under a workbook by clicking and click icon. Select this node in the Library, icon to display a summary of the design details in the upper pane 21.1 Fixed Sample Design 351 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint (known as Output Summary). The discussion so far gives you a quick feel of the software for computing sample size for a single look design. We will describe further features in an example for a group sequential design in the next section. 21.2 Group Sequential Design for a Binomial Superiority Trial 21.2.1 Study Background Design objectives and interim results from CAPTURE, a prospective randomized trial of placebo versus Abciximab for patients with refractory unstable angina were presented at a workshop on clinical trial data monitoring committees (Anderson, 2002). The primary endpoint was reduction in death or MI within 30 days of entering the study. The study was designed for 80% power to detect a reduction in the event rate from 15% on the placebo arm to 10% on the Abciximab arm. A two-sided test with a type-1 error of 5% was used. We will illustrate various design, simulation and interim monitoring features of East for studies with binomial endpoints with the help of this example. Let us modify Des1 to enter above inputs and create a group sequential design for icon. CAPTURE trial. Select the node for Des1 in the Library and click the This will take you back to the input window of Des1. Alternatively, you can also click the 352 button on the left hand bottom of East screen to go to the latest 21.2 Group Sequential Design – 21.2.1 Study Background <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 input window. Select 2-Sided for Test Type, enter 0.05 for Type I Error, 0.8 for Power, specify the Prop. under Control be 0.15, the Prop. under Treatment to be 0.1. Next, change the Number of Looks to be 3. You will see a new tab, Boundary Info, added to the input dialog box. Click the Boundary Info tab, and you will see the following screen. On this tab, you can choose whether to specify stopping boundaries for efficacy, or futility, or both. For this trial, choose efficacy boundaries only, and leave all other default values. We will implement the Lan-Demets (O’Brien-Fleming) spending function, with equally spaced looks. On the Boundary Info tab, click on the icons or 21.2 Group Sequential Design – 21.2.1 Study Background , to generate the 353 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint following charts. 354 21.2 Group Sequential Design – 21.2.1 Study Background <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You can also view these boundaries on different scales like δ scale or p-value scale. Select the desired scale from the dropdown. Let us see the boundaries on δ scale. Click Compute. This will add another row for Des2 in the Output Preview area. The maximum sample size required under this design is 1384. The expected sample sizes under H0 and H1 are 1378 and 1183, respectively. Click in the Output Preview toolbar to save this design to Wbk1 in the Library. Double-click on Des2 to generate the following output. 21.2 Group Sequential Design – 21.2.2 Creating multiple designs easily 355 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint 21.2.2 Creating multiple designs easily In East, it is easy to create multiple designs by inputting multiple parameter values. In the trial described above, suppose we want to generate designs for all combinations of the following parameter values: Power = 0.8, 0.9, and Difference in Proportions = −0.04, −0.03, −0.02, −0.01. The number of such combinations is 2 × 4 = 8. East can create all 8 designs by a single specification in the input dialog box. Select Des2 and click icon. Enter the above values in the Test Parameters tab as shown below. The values of Power have been entered as a list of comma-separated values, while Difference in Proportions has been entered as a colon-separated range of values: -0.04 to -0.01 in steps of 0.01. Now click compute. East computes all 8 designs Des3-Des10, and displays them in the Output Preview as shown below. Click to maximize the Output Preview. Select the first Des2 to Des4 using the Ctrl key, and click 356 to display a summary 21.2 Group Sequential Design – 21.2.2 Creating multiple designs easily <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the design details in the upper pane, known as the Output Summary. Des2 is already saved in the workbook. We will use this design for simulation and interim monitoring, as described below. Now that you have saved Des2, delete all designs from the Output Preview before continuing, by selecting all designs with the Shift key, and clicking 21.2.3 in the toolbar. Simulation Right-click Des2 in the Library, and select Simulate. Alternatively, you can select Des2 and click the icon. We will carry out a simulation of Des2 to check whether it preserves the specified power. Click Simulate. East will execute by default 10000 simulations with the specified inputs. Close the intermediate window after examining the results. A row labeled as Sim1 will be added in the Output Preview. 21.2 Group Sequential Design – 21.2.3 Simulation 357 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint Click the icon to save this simulation to the Library. A simulation sub-node, Sim1, will be added under Des2 node. Double clicking on this node will display the detailed simulation output in the work area. In 80.46% of the simulated trials, the null hypothesis was rejected. This tells us that the design power of 80% is achieved. Simulations is a tool which can be used to assess the study design under various scenarios. The next section will explore interim monitoring with this design. 358 21.2 Group Sequential Design – 21.2.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 21.2.4 Interim Monitoring Right-click Des2 in the Library and select Interim Monitoring. Click the to open the Test Statistic Calculator. Suppose that after 461 subjects, at the first look, you have observed 34 out of 230 responding on Control arm and 23 out of 231 responding on Treatment arm. The calculator computes the difference in proportions as −0.048 and its standard error of 0.031. Click OK to update the IM Dashboard. The Stopping Boundaries and Error Spending Function charts on the left: 21.2 Group Sequential Design – 21.2.4 Interim Monitoring 359 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint The Conditional Power and Confidence Intervals charts on the right: Suppose that after 923 subjects, at the second look, you have observed 69 out of 461 responding on Control arm and 23 out of 462 responding on Treatment arm. The 360 21.2 Group Sequential Design – 21.2.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 calculator computes the difference in proportions as −0.1 and its standard error of 0.019. Click Recalc, and then OK to update the IM Dashboard. In this case, a boundary has been crossed, and the following window appears. Click Stop to complete the trial. The IM Dashboard will be updated accordingly, and a 21.2 Group Sequential Design 361 <<< Contents 21 * Index >>> Tutorial: Binomial Endpoint table for Final Inference will be displayed as shown below. 362 21.2 Group Sequential Design <<< Contents * Index >>> 22 Binomial Superiority One-Sample This chapter deals with the design, simulation, and interim monitoring of two types of tests involving binomial response rates. In Section 22.1, we discuss group sequential designs in which an observed binomial response rate is compared to a fixed response rate, possibly derived from historical data. Section 22.2 deals with McNemar’s test for comparing matched pairs of binomial responses in a group sequential setting. 22.1 Binomial One Sample 22.1.1 Trial Design 22.1.2 Trial Simulation 22.1.3 Interim Monitoring In experimental situations, where the variable of interest has a binomial distribution, it may be of interest to determine whether the response rate π differs from a fixed value π0 . Specifically we wish to test the null hypothesis H0 : π = π0 against the two sided alternative hypothesis H1 : π 6= π0 or against one sided alternatives of the form H1 : π > π0 or H1 : π < π0 . The sample size, or power, is determined for a specified value of π which is consistent with the alternative hypothesis, denoted π1 . 22.1.1 Trial Design Consider the design of a single-arm oncology trial in which we wish to determine if the tumor response rate of a new cytotoxic agent is at least 15%. Thus, it is desired to test the null hypothesis H0 : π = 0.15 against the one-sided alternative hypothesis H1 : π > 0.15. We will design this trial with a one sided test that achieves 80% power at π = π1 = 0.25 with a one-sided level 0.05 test. Single-Look Design To begin, click Design tab, then Single Sample under Discrete group, and then click Single Proportion. In the ensuing dialog box , choose the test parameters as shown below. We first consider a single-look design, so leave the default value for Number of Looks to 1. In the drop down menu, next to Test Type select 1-Sided. Enter 0.8 for Power. Enter 22.1 Binomial One Sample – 22.1.1 Trial Design 363 <<< Contents 22 * Index >>> Binomial Superiority One-Sample 0.15 in the box next to Prop. Response under Null (π0 ) and 0.25 in the box next to Prop. Response under Alt (π1 ). This dialog box also asks us to specify whether we wish to standardize the test statistic (for performing the hypothesis test of the null hypothesis H0 : π = 0.15) with the null or the empirical variance. We will discuss the test statistic and the method of standardization in the next subsection. For the present, select the default radio button Under Null Hypothesis. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. The sample size required in order to achieve the desired 80% power is 91 subjects. You can select this design by clicking anywhere on the row in the Output Preview. Click icon to get the design output summary displayed in the upper pane. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. If you hover the cursor over the node Des1 in the Library, a 364 22.1 Binomial One Sample – 22.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tooltip will appear that summarizes the input parameters of the design. With the design Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Treatment Effect (δ). The power curve for this design will be displayed. You can save this chart to the Library by clicking Save in Workbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 22.1 Binomial One Sample – 22.1.1 Trial Design 365 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Three-Look Design In order to reach an early decision and enter into comparative trials, let us plan to conduct this single-arm study as a group sequential trial with a maximum of 3 looks. Create a new design by selecting Des1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). Technical details of these stopping boundaries are available in Appendix F. Return to the test parameters by clicking Test Parameters tab. The dialog box requires us to make a selection in the section labeled Variance of Standardized Test Statistic. We are being asked to specify to East how we intend to standardize the test statistic when we actually perform the hypothesis tests at the various monitoring time points. There are two options: Under Null Hypothesis and Empirical Estimate. To understand the difference between these two options, let π̂j denote the estimate of π based on nj observations, up to and including the j th monitoring time point. 366 22.1 Binomial One Sample – 22.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under Null Hypothesis The test statistic to be used for the interim monitoring is (N ) Zj =p π̂j − π0 . π0 (1 − π0 )/nj (22.1) Empirical The test statistic to be used for the interim monitoring is (E) Zj =p π̂j − π0 . π̂j (1 − π̂j )/nj (22.2) The choice of variance should not make much of a difference to the type 1 error or power for studies in which the sample size is large. In the present case however, it might matter. We shall therefore examine both the options. First, we select the Under Null Hypothesis radio button. Click Compute button to generate output for Design Des2. With Des2 selected in the Output Preview, click icon to save Des2 to the Library. In order to see the stopping probabilities, as well as other characteristics, select Des2 in the Library, and click icon. The cumulative boundary stopping probabilities are shown in the Stopping Boundaries table. We see that for Des2 the maximum sample size is 91 subjects, with 90 expected under the null hypothesis H0 : π = 0.15 and 73 expected when the true value is π=0.25. Close the Output window before continuing. The stopping boundary can be displayed by clicking on the icon on the Library toolbar, and then clicking Stopping 22.1 Binomial One Sample – 22.1.1 Trial Design 367 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Boundaries. The following chart will appear. To examine the error spending function, click icon on the Library toolbar, and then click Error Spending. The following chart will appear. 368 22.1 Binomial One Sample – 22.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To examine the impact of using the empirical variance to standardized test statistic, select Des2 in the Library, and click icon on the Library toolbar. In the Variance of Standardized Test Statistic box, now select Empirical Estimate. Next, click Compute. With Des3 selected in the Output Preview, click icon. In the Library, select the nodes Des2 and Des3, by holding the Ctrl key, and then click icon. The upper pane will display the summary details of the two designs side-by-side: The maximum sample size needed for 80% power is 119, and the expected sample size is 99 under the alternative hypothesis H1 with π1 = 0.25, if we intend to standardize the test statistic with the empirical variance. The corresponding maximum and 22.1 Binomial One Sample – 22.1.1 Trial Design 369 <<< Contents 22 * Index >>> Binomial Superiority One-Sample expected sample sizes if the null variance is to be used for the standardization are 91 and 73, respectively. Thus, for this configuration of design parameters, it would appear preferable to specify in advance that the test statistic will be standardized by the null variance. Evidently, this is the option with the smaller maximum and expected sample size. These results, however, are based on the large sample theory developed in Appendix B. Since the sample sizes in both Des2 and Des3 are fairly small, it would be advisable to verify that the power and type 1 error of both the plans are preserved by simulating these designs. We show how to simulate these plans in Section 22.1.2. In some situations, the sample size is subject to external constraints. Then, the power can be computed for a specified maximum sample size. Suppose that in the above situation, using the observed estimates for the computation of the variance, the total sample size is constrained to be at most, 80 subjects. Select Des3 in the Library and click on the Library toolbar. Change the selections in the ensuing dialog box so that the trial is now designed to compute power for a maximum sample size of 80 subjects, as shown below. Click Compute button to generate the output for Design Des4. With Des4 selected in the Output Preview, click icon. In the Library, select the nodes for Des2, Des3, and Des4 by holding the Ctrl key, and then click 370 22.1 Binomial One Sample – 22.1.1 Trial Design icon. The upper pane <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 will display the summary details of the three designs side-by-side: From this, we can see that Des4 has only 65.5 % power. 22.1.2 Trial Simulation In Section 22.1.1, we created group sequential designs with two different assumptions for the manner in which the test would be standardized at the interim monitoring stage. Under Des2, we assumed that the null variance, and hence the test statistic (22.1) would be used for the interim monitoring. This plan required a maximum sample size of 91 subjects. Under Des3, we assumed that the empirical variance, and hence the test statistic (22.2) would be used for the interim monitoring. This plan required a maximum sample size of 119 subjects. Since the sample sizes for both plans are fairly small and the calculations involved the use of large sample theory, it would be wise to verify the operating characteristics of these two plans by simulation. Select Des2 in the Library, and click the icon from Library toolbar. Alternatively, right-click on Des2 node and select Simulate. A new Simulation 22.1 Binomial One Sample – 22.1.2 Trial Simulation 371 <<< Contents 22 * Index >>> Binomial Superiority One-Sample worksheet will appear. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. Select Sim1 row in the Output Preview and click icon. Note that some of the simulation output details will be displayed in the upper pane. Click icon to save it to the Library. Double-click on Sim1 node in the Library. The simulation output details will be displayed. 372 22.1 Binomial One Sample – 22.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Upon running 10,000 simulations with π = 0.25 we obtain slightly over 80% power as shown above. Next we run 10,000 simulations under H0 by setting π = 0.15 in the choice of simulation parameters. Select Des2 in the Library, and click icon from Library toolbar. Under the Response Generation tab, change the Proportion Response to 0.15. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim2. Select Sim2 in the Output Preview. Click icon to save it to the Library. Double-click on Sim2 in the Library. The simulation output details will be displayed. We observe that 7% of these simulations reject the null hypothesis thereby confirming that these boundaries do indeed preserve the type 1 error (up to Monte Carlo accuracy). Finally we repeat the same set of simulations for Des3. Select Des3 in the Library, and click icon from Library toolbar. Upon running 10,000 simulations with 22.1 Binomial One Sample – 22.1.2 Trial Simulation 373 <<< Contents 22 * Index >>> Binomial Superiority One-Sample π = 0.25, we obtain 82% power. However, when we run the simulations under H0 : π = 0.15, we obtain a type 1 error of about 3% instead of the specified 5% as shown below. While this ensures that the type 1 error is preserved, it also suggests that the use of the empirical variance rather than the null variance to standardize the test statistic might be problematic with small sample sizes. 374 22.1 Binomial One Sample – 22.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let us now investigate if the problem disappears with larger studies. Select Des3 in the Library and click on the Library toolbar. Change the value of Prop. Response under Alt (π1 ) from 0.25 to 0.18. Click Compute to generate the output for Des5. In the Output Preview, we see that Des5 requires a sample size of 1035 subjects. To verify whether the use of the empirical variance will indeed produce the correct type-1 error for this large trial, select Des5 in the Output Preview and click icon. In the Library, select Des5 icon from Library toolbar . First, run 10,000 trials with π = 0.15. On and click the Response Generation tab, change Proportion Response from 0.18 to 0.15. Next click Simulate. Observe that the type-1 error obtained by simulating Des5 is about 4.4%, an improvement over the corresponding type 1 error obtained by simulating Des3. 22.1 Binomial One Sample – 22.1.2 Trial Simulation 375 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Next, verify that a sample size of 1035 suffices for producing 80% power by running 10,000 simulations with π = 0.18. This example has demonstrated the importance of simulating a design to verify that it does indeed possess the operating characteristics that are claimed for it. Since these operating characteristics were derived by large-sample theory, they might not hold for small sample sizes, in which case, the sample size or type-1 error might have to be adjusted appropriately. 22.1.3 Interim Monitoring Consider interim monitoring of Des3, the design that has 80% power when the empirical estimate of variance is used to standardize the test statistic. Select Des3 in the Library, and click icon from the Library toolbar. Alternatively, right-click on Des3 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the 376 22.1 Binomial One Sample – 22.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clinical trial and are useful tools for decision making by a data monitoring committee. At the first interim look, when 40 subjects have enrolled, suppose that the observed response rate is 0.35. Click icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 40. Enter 0.35 in the box next to Estimate of π. In the box next to Standard Error of Estimate of π enter 22.1 Binomial One Sample – 22.1.3 Interim Monitoring 377 <<< Contents 22 * Index >>> Binomial Superiority One-Sample 0.07542. Next click Recalc. Observe that upon pressing the Recalc button, the test statistic calculator automatically computes the value of the test statistic as 2.652. 378 22.1 Binomial One Sample – 22.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Clicking OK results in the following output. Since our test statistic, 2.652, is smaller than the stopping boundary, 3.185, the trial continues. At the second interim monitoring time point, after 80 subjects have enrolled, suppose that the estimate of π̂ based on all data up to that point is 0.30. Click on the second row in the table in the upper section. Then click icon. In the box next to Cumulative Sample Size enter 80. Enter 0.30 in the box next to Estimate of π. In the box next to Standard Error of Estimate of π enter 0.05123. Next click Recalc. Upon clicking OK we observe that the stopping boundary is crossed and the following 22.1 Binomial One Sample – 22.1.3 Interim Monitoring 379 <<< Contents 22 * Index >>> Binomial Superiority One-Sample message is displayed. We can conclude that π > 0.15 and terminate the trial. Clicking Stop yields the following output. 22.2 380 McNemar’s Test McNemar’s Test is used in experimental situations where paired comparisons are observed. In a typical application, two binary response measurements are made on each subject – perhaps from two different treatments, or from two different time points. For example, in a comparative clinical trial, subjects are matched on baseline demographics and disease characteristics and then randomized with one subject in the 22.2 McNemar’s Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 pair receiving the experimental treatment and the other subject receiving the control. Another example is the cross over clinical trial in which each subject receives both treatments. By random assignment, some subjects receive the experimental treatment followed by the control while others receive the control followed by the experimental treatment. Let πc and πt denote the response probabilities for the control and experimental treatments, respectively. The probability parameters for McNemar’s test are displayed in Table 22.1. Table 22.1: A 2 x 2 Table of Probabilities for McNemar’s Test Control No Response Response Total Probability Experimental No Response Response π00 π01 π10 π11 1 − πt πt Total Probability 1 − πc πc 1 The null hypothesis H0 : πc = πt is tested against the alternative hypothesis H1 : πc 6= πt for the two sided testing problem or the alternative hypothesis H1 : πc > πt (or H1 : πc < π) for the one-sided testing problem. Since πt = πc if and only if π01 = π10 , the null hypothesis is also expressed as H0 : π01 = π10 , and is tested against corresponding one and two sided alternatives. The power of this test depends on two quantities: 1. The difference between the two discordant probabilities (which is also the difference between the response rates of the two treatments) δ = π01 − π10 = πt − πc ; 22.2 McNemar’s Test 381 <<< Contents 22 * Index >>> Binomial Superiority One-Sample 2. The sum of the two discordant probabilities ξ = π10 + π01 . East accepts these two parameters as inputs at the design stage. We next specify the test statistic to be used during the interim monitoring stage. Suppose we intend to execute McNemar’s test a maximum of K times in a group sequential setting. Let the cumulative data up to and including the j th interim look consist of N (j) matched pairs arranged in the form of the following 2 × 2 contingency table of counts: Table 22.2: 2 × 2 Contingency Table of Counts of Matched Pairs at Look j Control No Response Response Total Probability Experimental No Response Response n00 (j) n01 (j) n10 (j) n11 (j) c0 (j) c1 (j) Total Probability r0 (j) r1 (j) N (j) For a = 0, 1 and b = 0, 1 define π̂ab (j) = nab (j) N (j) (22.3) Then the sequentially computed McNemar test statistic at look j is Zj = δ̂j se(δ̂j ) (22.4) where δ̂j = π̂01 (j) − π̂10 (j) (22.5) and p se(δ̂j ) = 382 [n01 (j) + n10 (j)] N (j) 22.2 McNemar’s Test – 22.2.1 Trial Design (22.6) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 We now show how to use East to design and monitor a clinical trial based on McNemar’s test. 22.2.1 Trial Design Consider a trial in which we wish to determine whether a transdermal delivery system (TDS) can be improved with a new adhesive. Subjects are to wear the old TDS (control) and new TDS (experimental) in the same area of the body for one week each. A response is said to occur if the TDS remains on for the entire one week observation period. From historical data, it is known that control has a response rate of 85% (πc = 0.85). It is hoped that the new adhesive will increase this to 95% (πt = 0.95). Furthermore, of the 15% of the subjects who did not respond on the control, it is hoped that 87% will respond on the experimental system. That is, π01 = 0.87 × 0.15 = 0.13. Based on these data, we can fill in all the entries of Table 22.1 as displayed in Table 22.2. Table 22.3: McNemar Probabilities for the TDS Trial Control No Response Response Total Probability Experimental No Response Response 0.02 0.13 0.03 0.82 0.05 0.95 Total Probability 0.15 0.85 1 Although it is expected that the new adhesive will increase the adherence rate, the comparison is posed as a two-sided testing problem, testing H0 : πc = πt against H1 : πc 6= πt at the 0.05 level. We wish to determine the sample size to have 90% power for the values displayed in Table 22.3. To design this trial, click Design tab, then Single Sample on the Discrete group, and then click McNemar’s Test for 22.2 McNemar’s Test – 22.2.1 Trial Design 383 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Matched Pairs. Single-Look Design First, consider a study with no interim analyses, and 90% power for two sided test at α = 0.05. Choose the design parameters as shown below. We first consider a single-look design, so leave the default value for Number of Looks to 1. Enter 0.9 for Power. As shown in Table 22.2, we must specify δ1 = πt − πc = 0.1 and ξ = π01 + π10 = 0.16. Click Compute. The design Des1 is shown as a row in the Output Preview located in the lower pane of this window. A total of 158 subjects is required to have 90% power. You can select this design by clicking anywhere on the row in the Output Preview. 384 22.2 McNemar’s Test – 22.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on icon to get the output summary displayed in the upper pane. In the Output Preview toolbar, click the icon to save this design Des1 to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. Five-Look Design Now consider the same design with a maximum of 5 looks, using the default Lan-DeMets (O’Brien-Fleming) spending function. Create a new design by selecting Des1 in the Library, and clicking icon on the Library toolbar. Change the Number of Looks from 1 to 5, to generate a study with four interim looks and a final analysis. A new tab Boundary will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). 22.2 McNemar’s Test – 22.2.1 Trial Design 385 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Technical details of these stopping boundaries are available in Appendix F. Click Compute to generate output for Des2. With Des2 selected in the Output Preview, click the icon to save Des2 to the Library. In the Library, select the nodes for both Des1 and Des2, by holding the Ctrl key, and then click the icon. The upper pane will display the output summary of the two designs side-by-side: There has been a slight inflation in the maximum sample size, from 158 to 162. However, the expected sample size is 120 subjects if the alternative hypothesis of δ1 = 0.10 and ξ = 0.16 holds. The stopping boundary, spending function, and Power 386 22.2 McNemar’s Test – 22.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 vs. Sample Size charts can all be displayed by clicking on the appropriate icons from the Library toolbar. 22.2.2 Interim Monitoring Consider interim monitoring of Des2. Select Des2 in the Library, and click icon from the Library toolbar. Alternatively, right-click on Des2 and select Interim Monitoring. A new IM worksheet will appear. Suppose, that the results are to be analyzed after results are available for every 32 subjects. After the first 32 subjects were enrolled, one subject responded on the control arm and did not respond on the treatment arm; four subjects responded on the treatment arm but did not respond on the control arm; 10 subjects did not respond on either treatment; 17 subjects responded on both the arms. This information is sufficient to complete all the entries in Table 22.3 and hence to evaluate the test statistic value. Click icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 32. Enter the values in the table as shown below and click Recalc. 22.2 McNemar’s Test – 22.2.2 Interim Monitoring 387 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Clicking OK results in the following entry in the first look row. As you can see the value of the test statistic, 1.342, is within the stopping boundaries, (4.909,-4.909). Thus, the trial continues. The second interim analysis was performed after data were available for 64 subjects. A total of two subjects responded on the control arm and failed to respond on the treatment arm; seven subjects responded on the treatment arm and failed to respond on the control arm; 20 subjects responded on neither arm; 35 subjects responded on both the arms. Click on the second row in the table in the upper section. Then click icon. Enter the appropriate values in the table as shown below and click Recalc. 388 22.2 McNemar’s Test – 22.2.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then click OK. This results in the following screen. At the third interim analysis, after 96 subjects were enrolled, a total of two subjects responded on the control arm and failed to respond on the treatment arm; 13 subjects responded on the treatment arm and failed to respond on the control arm; 32 subjects did not respond on either arm; 49 subjects responded on both the arms. Click on the third row in the table in the upper section. Then click icon. Enter the appropriate values in the table as shown below and click Recalc. 22.2 McNemar’s Test – 22.2.2 Interim Monitoring 389 <<< Contents 22 * Index >>> Binomial Superiority One-Sample Then click OK. This results in the following message box. Clicking on Stop yields the following Interim Monitoring output. We reject the null hypothesis that δ = 0, based on these data. 22.2.3 Simulation Des2 can be simulated to examine the properties for different values of the parameters. First, we verify the results under the alternative hypothesis at which the power is to be controlled, namely δ1 =0.10 and ξ=0.16. Select Des2 in the Library, and click 390 22.2 McNemar’s Test – 22.2.3 Simulation icon from Library toolbar. Alternatively, <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 right-click on Des2 and select Simulate. A new Simulation worksheet will appear. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. Select Sim1 in the Output Preview. If you click icon, you will see some of the simulation output details displayed in the upper pane. Click icon to save it to the Library. Double-click on Sim1 in the Library. The simulation output details will be displayed 22.2 McNemar’s Test – 22.2.3 Simulation 391 <<< Contents 22 * Index >>> Binomial Superiority One-Sample as shown below. The results confirm that the power is at about 90%. To confirm the results under the null hypothesis, set δ1 = 0 in the Response Generation tab in the simulation worksheet and then click Simulate. The results, which confirm that the type-1 error rate is approximately 5%, are given below. 392 22.2 McNemar’s Test – 22.2.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 While it is often difficult to specify the absolute difference of the discordant probabilities, δ1 , it is even more difficult to specify the sum of the discordant probabilities, ξ. Simulation can be used to examine the effects of misspecification of ξ. Run the simulations again, now with δ1 =0.10 and ξ=0.2. The results are given below. Notice that this provides a power of approximately 81%. Larger values of ξ would further decrease the power. However, values of ξ > 0.2 with δ1 = 0.1 would be inconsistent with the initial assumption of πc = 0.85 and πt =0.95. Additional simulations for various values of δ and ξ can provide information regarding the consequences of misspecification of the input parameters. 22.2 McNemar’s Test 393 <<< Contents * Index >>> 23 Binomial Superiority Two-Sample In experiments based on binomial data, the aim is to compare independent samples from two populations in terms of the proportion of sampling units presenting a given trait. In medical research, outcomes such as the proportion of patients responding to a therapy, developing a certain side effect, or requiring specialized care, would satisfy this definition. East supports the design, simulation, and interim monitoring of clinical trials in which this comparison is based on the difference of proportions, the ratio of proportions, or the odds ratio of the two populations. The three cases are discussed in the following sections. 23.1 Difference of Two Binomial Proportions 23.1.1 Trial Design 23.1.2 Interim Monitoring 23.1.3 Pooled versus Unpooled Designs Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let δ = πt − πc . Interest lies in testing the null hypothesis that δ = 0 against one and two-sided alternatives. A special characteristic of binomial designs is the dependence of the variance of a binomial random variable on its mean. Because of this dependence, even if we keep all other test parameters the same, the maximum sample size required to achieve a specified power will be affected by how we intend to standardize the difference of binomial response rates when computing the test statistic at the interim monitoring stage. There are two options for computing the test statistic – use either the unpooled or pooled estimate of variance for standardizing the observed treatment difference. Suppose, for instance, that at the jth interim look the observed response rate on the treatment arm is π̂tj , and the observed response rate on the control arm is π̂cj . Let ntj and ncj be the number of patients on the treatment and control arms, respectively. Then the test statistic based on the unpooled variance is (u) Zj =q π̂tj − π̂cj π̂tj (1−π̂tj ) ntj + π̂cj (1−π̂cj ) ncj (23.1) . In contrast, the test statistic based on the pooled variance is (p) Zj =q where π̂j = (p) π̂tj − π̂cj π̂j (1 − π̂j )[ n1tj + 1 ncj ] ntj π̂tj + ncj π̂cj . ntj + ncj , (23.2) (23.3) It can be shown that [Zj ]2 is the familiar Pearson chi-square statistic computed from all the data accumulated by the jth look. 394 23.1 Difference of Two Binomials <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The maximum sample size required to achieve a given power depends on whether, at the interim monitoring stage, we intend to use the unpooled statistic (23.1) or the pooled statistic (23.2) to determine statistical significance. The technical details of the sample size computations for these two options are given in Appendix B, Section B.2.5. The CAPTURE clincial trial is designed in Section 23.1.1 and monitored in Section 23.1.2 under the assumption that the unpooled statistic will be used for interim monitoring. In Section 23.1.3, however, the same trial is re-designed, on the basis of the pooled variance. It is seen that the difference in sample size due to the two design assumptions is almost negligible. This is because the CAPTURE trial utilized balanced randomization. We show further in Section 23.1.3 that if the randomization is unbalanced, the difference in sample size based on the two design assumptions can be substantial. 23.1.1 Trial Design Design objectives and interim results from CAPTURE, a prospective randomized trial of placebo versus Abciximab for patients with refractory unstable angina were presented at a workshop on clinical trial data monitoring committees (Anderson, 2002). The primary endpoint was reduction in death or MI within 30 days of entering the study. The study was designed for 80% power to detect a reduction in the event rate from 15% on the placebo arm to 10% on the Abciximab arm. A two-sided test with a type-1 error of 5% was used. We will illustrate various design and interim monitoring features of East for studies with binomial endpoints with the help of this example. Thereby this example can serve as a model for designing and monitoring your own binomial studies. Single Look Design To begin, click Design tab, then Two Samples on the Discrete group, and then click Difference of Proportions. The goal of this study is to test the null hypothesis, H0 , that the Abciximab and placebo arms both have an event rate of 15%, versus the alternative hypothesis, H1 , that Abciximab reduces the event rate by 5%, from 15% to 10%. It is desired to have a two sided test with three looks at the data, a type-1 error of α = 0.05 and a power of (1 − β) = 0.8. Choose the test parameters as shown below. We first consider a single-look design, so leave the default value for Number of Looks to 1. Enter 0.8 for the Power. To specify the appropriate effect size, enter 0.15 for the Prop. Under Control and 0.10 for the Prop. Under Treatment. Notice that you have the option to select the manner in which the test statistic will be standardized at the hypothesis testing stage. If you choose Unpooled Estimate, the standardization will be according to equation (23.1). 23.1 Difference of Two Binomials – 23.1.1 Trial Design 395 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample If you choose Pooled Estimate, the standardization will be according to equation (23.2). For the present, choose the Unpooled Estimate option. The other choice in this dialog box is whether or not to use the Casagrande-Pike-Smith (1978) correction for small sample sizes. This is not usually necessary as can be verified by the simulation options in East. The dialog box containing the test parameters will now look as shown below. Next, click Compute button. The design is shown as a row in the Output Preview located in the lower pane of this window. The computed sample size (1366 subjects) is highlighted in yellow. You can select this design Des1 by clicking anywhere on the row in the Output Preview. Now you can click icon to see the output summary displayed in the icon to save this design Des1 upper pane. In the Output Preview toolbar, click to Workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a 396 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click the icon on the Library toolbar, and the click Power vs Treatment Effect (δ). The resulting power curve for this design is shown. You can save this chart to the Library by clicking Keep. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. Group Sequential Design Create a new design by selecting Des1 in the Library, 23.1 Difference of Two Binomials – 23.1.1 Trial Design 397 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample and clicking icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). Technical details of these stopping boundaries are available in Appendix F. Click Boundary tab to see the details of cumulative alpha spent, and the boundary values, in the Look Details table. Click Compute to generate output for a new design Des2. The 3-look group sequential design displayed in Des2 requires an upfront commitment of up to a maximum of 1384 patients. That is 18 patients more than the fixed sample design displayed in Des1. Notice, however, that under the alternative hypothesis of a 5% drop in the event rate, the expected sample size is only 1183 patients – a saving of 201 patients relative to the fixed sample design. This is because the test statistic could cross a stopping boundary at one of the interim looks. With Des2 selected in the Output Preview, click icon to save Des2 to the Library. In order to see the stopping probabilities, as well as other characteristics, 398 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 select Des2 in the Library, and click icon. The cumulative boundary stopping probabilities are shown in the Stopping Boundaries table. Close the Output window before continuing. The stopping boundary chart can be brought up by clicking icon on the Library toolbar, and then clicking Stopping Boundaries. The following chart will appear. Lan-DeMets Spending Function: O’Brien-Fleming Version Close this chart, and click icon in the Library toolbar and then Error 23.1 Difference of Two Binomials – 23.1.1 Trial Design 399 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Spending The following chart will appear. This spending function was proposed by Lan and DeMets (1983), and for two-sided tests has the following functional form : zα/4 α(t) = 4 − 4Φ √ . (23.4) t Notice that hardly any type-1 error is spent in the early stages of the trial but the rate of error spending increases rapidly as the trial progresses. This is reflected in the corresponding stopping boundaries. The upper and lower boundary values are rather wide apart initially (±3.712 standard deviations) but come closer together with each succeeding interim look until at the last look the standardized test statistic crosses the boundary at ±1.993 standard deviations. This is not too far off from the corresponding boundary values, ±1.96, required to declare statistical significance at the 0.05 level for a fixed sample design. For this reason this spending function is often adopted in preference to other spending functions that spend the type-1 error more aggressively and thereby reduce the expected sample size under H1 by a greater amount. Lan-DeMets Spending Function: Pocock Version A more aggressive spending function, also proposed by Lan and DeMets (1983), is PK which refers to Pocock. This spending function captures the spirit of the Pocock (1977) stopping boundary belonging to the Wang and Tsiatis (1987) power family, and 400 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 has the functional form α(t) = α log(1 + (e − 1)t) . (23.5) Select Des2 in the Library, and click icon on the Library toolbar. On the Boundary tab, change the Parameter from OF to PK, and click Compute to create design Des3. With Des3 selected in the Output Preview, click the icon. In the Library, select the nodes for both Des2 and Des3, by holding the Ctrl key, and then click the side-by-side: icon. The upper pane will display the details of the two designs Under Des3, you must make an up-front commitment of up to 1599 patients, considerably more than you would need for a fixed sample design. However, because the type-1 error is spent more aggressively in the early stages, the expected sample size is only 1119 patients. For now, close this output window, and click icon on the Library toolbar to 23.1 Difference of Two Binomials – 23.1.1 Trial Design 401 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample compare the two designs according to Power vs. Sample Size. Using the same icon, select Stopping Boundaries. Notice, by moving the cursor from right to left in the stopping boundary charts, that the stopping boundary derived from the PK spending function is approximately flat, requiring ±2.28 standard deviations at the first look and ±2.29 standard deviations at the second and ±2.30 third looks. In contrast, the stopping boundary derived from the OF spending function requires ±3.71 standard deviations at the first look, ±2.51 standard deviations at the second look and ±1.99 standard deviations at the third look. This translates into a smaller expected sample size under H1 for Des3 than for Des2. This advantage is, however, offset by at least two drawbacks of the stopping boundary derived from the PK spending function; the large up-front commitment of 1599 patients, and the large standardized test statistic of 2.295 (corresponding to a two-sided p value of 0.0217) 402 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 required at the last look in order to declare statistical significance. Using the same icon, select Error Spending to compare the two designs graphically in terms of error spending functions. Des3 (PK) spends the type-1 error probability at a much faster rate than Des2 (OF). Close the chart before continuing. 23.1 Difference of Two Binomials – 23.1.1 Trial Design 403 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Wang and Tsiatis Power Boundaries The stopping boundaries generated by the Lan-Demets OF and PK functions closely resemble closely the classical O’Brien-Fleming and Pocock stopping boundaries, respectively. These classical boundaries are a special case of a family of power boundaries proposed by Wang and Tsiatis (1987). For a two-sided α level test, using K equally spaced looks, the power boundaries for the standardized test statistic Zj at the j-th look are of the form C(∆, α, K) Zj ≥ . (23.6) (j/K)0.5−∆ The normalizing constant C(∆, α, K) is evaluated by recursive integration so as to ensure that the K-look group sequential test has type-1 error equal to α. Select Des3 in the Library and click on the Library toolbar. On the Boundary tab, change the Boundary Family from Spending Functions to Wang-Tsiatis. Leave the default value of ∆ as 0 and click Compute to create design Des4. With Des4 selected in the Output Preview, click the icon. In the Library, select both Des2 and Des4 by holding the Ctrl key. Click icon, and under Select Chart on the right, select Stopping Boundaries. As expected, the boundary values for Des2 (Lan-Demets, OF) and Des4 (Wang-Tsiatis, ∆ = 0) are very similar. 404 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Close the chart before continuing. The Power Chart and the ASN Chart East provides some additional tools for evaluating study designs. Select Des3 in the Library, click the icon, and then click Power vs. Treatment effect (δ). By scrolling from left to right with the vertical line cursor, one can observe the power for various values of the effect size. Close this chart, and with Des3 selected, click the 23.1 Difference of Two Binomials – 23.1.1 Trial Design icon again. Then click 405 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Expected Sample Size. The following chart appears: This chart displays the Expected Sample Size as a function of the effect size and confirms that for Des3 the average sample size is 1566 under H0 (effect size, zero) and 1120 under H1 (effect size, -0.05). Unequally spaced analysis time points In the above designs, we have assumed that analyses were equally spaced. This assumption can be relaxed if you know when interim analyses are likely to be performed (e.g. for administrative reasons). In either case, departures from this assumption are allowed during the actual interim monitoring of the study, but sample size requirements will be more accurate if allowance is made for this knowledge. icon. Under Spacing of Looks in With Des3 selected in the Library, click the the Boundary tab, click the Unequal radio button. The column titled Info. Fraction in the Look Details table can be edited to modify the relative spacing of the analyses. The information fraction refers to the proportion of the maximum (yet unknown) sample size. By default, this table displays equal spacing, but suppose that the two interim analyses will be performed with 0.25 and 0.5 (instead of 0.333 and 0.667) of the maximum sample size. Enter these new information fraction values and click Compute to create design Des5. Select Des5 in the Output Preview and click icon to save it in the Library for now. Arbitrary amounts of error probability to be spent at each analysis 406 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Another feature of East is the possibility to specify arbitrary amounts of cumulative error probability to be used at each look. This option can be combined with the option of unequal spacing of the analyses. With Des5 selected in the Library, click the icon on the Library toolbar. Under the Boundary tab, select Interpolated for the Spending Function. In the column titled Cum. α Spent, enter 0.005 for the first look and 0.03 for the second look, and click Compute to create design Des6. Select Des6 in the Output Preview and click icon. From the Library, select Des5 and Des6 by holding the Ctrl key. Click icon, and under Select Chart on the right, select Stopping Boundaries. The following chart will be displayed. Computing power for a given sample size When sample size is a given design constraint, East can compute the achieved power, given the other test parameters. Select Des6 in the Library and click icon. On the Test Parameters tab, click the radio button for Power(1 − β). You will notice that 23.1 Difference of Two Binomials – 23.1.1 Trial Design 407 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample the field for power will contain the word Computed. You may now enter a value for the sample size: 1250, and click Compute. The following output will appear in Output Preview in Des7 row, where, as expected, the achieved power is less than 0.9, namely 0.714. To delete this design, click Des7 in the Output Preview, and click icon in the textOutput Preview toolbar. East will display a warning to make sure that you want to delete the selected row. Click Yes to continue. Stopping Boundaries for Early Rejection of H0 or H1 Although both Des2 and Des3 reduce the expected sample size substantially by rejecting H0 when H1 is true, they are unable to do so if H0 is true. It is, however, often desirable to terminate a study early if H0 is true since that would imply that the new treatment is no different than the standard treatment. East can produce stopping boundaries that result in early termination either under either H0 or H1 . Stopping boundaries for early termination if 408 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 H1 is true are known as efficacy boundaries. They are obtained by choosing an appropriate α-spending function. These boundaries ensure that the type 1 error does not exceed the pre-specified significance level α. East can also construct stopping boundaries for rejecting H1 and terminating early if H0 is true. These stopping boundaries are known as futility boundaries. They are obtained by choosing an appropriate β spending function. These boundaries ensure that the type 2 error does not exceed β and thereby ensure that the power of the study is preserved at 1 − β despite the possibility of early termination for futility. Pampallona and Tsiatis (1994) have extended the error spending function methodology of Lan and DeMets (1983) so as to spend both α, the type-1 error, and β, the type-2 error, and thereby obtain efficacy and futility boundaries simultaneously. East provides you with an entire catalog of published spending functions from which you can take your pick for generating both the H0 and H1 boundaries. For various reasons, investigators usually prefer to be very conservative about early stopping for efficacy but are likely to be more aggressive about cutting their losses and stopping early for futility. Suppose then that you wish to use the conservative Lan-DeMets (OF) spending function for early termination to reject H0 in favor of H1 , but use a more aggressive spending function for early termination to reject H1 in favor of H0 . Possible choices for spending functions to reject H1 that are more aggressive than Lan-DeMets(OF) but not as aggressive as Lan-DeMets(PK) are members of the Rho family (Jennison and Turnbull, 2000) and the Gamma family (Hwang, Shih and DeCani, 1990). For illustrative purposes we will use the Gamam(-1) spending function from the Gamma family. Select Des2 in the Library and click icon. For the futility boundary on the Boundary tab, select Spending Functions and then select Gamma Family. Set the Parameter to −1. Also, click on the Binding option to the right. The screen 23.1 Difference of Two Binomials – 23.1.1 Trial Design 409 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample will look like this: On the Boundary tab, you may click icon, or icon to view plots of the error spending functions, or stopping boundaries, respectively. Observe that the β-spending function (upper in red) spends the type-2 error 410 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 substantially faster than the α-spending function (lower in blue). These stopping boundaries are known as inner-wedge stopping boundaries. They divide the sample space into three zones corresponding to three possible decisions. If the test statistic enters the lower blue zone, we terminate the trial, reject H0 , and conclude that the new treatment (Abciximab) is beneficial relative to the placebo. If the test statistic enters the upper blue zone, we terminate the trial, reject H0 , and conclude that the new treatment is harmful relative to the placebo. If the test statistic enters the center (pink) zone, we terminate the trial, reject H1 , and conclude that Abciximab offers no benefit relative to placebo. Assuming that the event rate is 0.15 for the placebo arm, this strategy has a 2.5% chance of declaring benefit and a 2.5% chance of declaring harm when the event rate for the Abciximax arm is also 0.15. Furthermore this strategy has a 20% chance of entering the pink zone and declaring no benefit when there actually is a substantial benefit with Abciximax, resulting in a drop in the event rate from 0.15 to 0.1. In other words, Des7 has a two-sided type-1 error of 5% and 80% power. Click Compute and with Des7 selected in the Output Preview, click the icon. To view the design details, click the icon. Des7 requires an up-front commitment of 1468 patients, but the expected sample size is 1028 patients under H0 , and 1164 patients under H1 . You may wish to save this output (e.g., in HTML format) by clicking on the icon, or to print by clicking on the 23.1 Difference of Two Binomials – 23.1.1 Trial Design icon. Close the 411 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample output window before continuing. Boundaries with Early Stopping for Benefit or Futility Next suppose you are interested in designing the clinical trial in such a way that you can reach only two conclusions, not three. You wish to demonstrate either that Abciximab is beneficial relative to placebo or that it offers no benefit relative to placebo, but there is no interest in demonstrating that Abciximab is harmful relative to placebo. To design this two-decision trial select Des7 in the Library and click the icon. Change the entry in the Test Type cell from 2-Sided to 1-Sided. Check to ensure other specifications are same as in Des7. Click Compute to generate the design. The error spending functions are the same but this time the stopping boundaries divide 412 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the sample space into two zones only as shown below. If the test statistic enters the lower (blue) zone, the null hypothesis is rejected in favor of concluding that Abciximab is beneficial relative to placebo. The probability of this event under H0 is 0.05. If the test statistic enters the upper (pink) zone the alternative hypothesis is rejected in favor of concluding that Abciximab offers no benefit relative to placebo. The probability of this event under H1 is 0.2. In other words, Design8 has a one sided type-1 error rate of 5% and 80% power. Since Design8 precludes the possibility of demonstrating that Abciximab is harmful relative to placebo, it requires far fewer patients. It only requires an up-front commitment of 1156 patients and the expected sample size is 681 if H0 is true and 892 if H1 is true. Before continuing to the next section, we will save the current workbook, and open a new workbook. Select the workbook node in the Library and Click the button in the top left hand corner, and click Save. Alternatively, select Workbook1 in the Library and right-click, then click Save. This saves all the work done so far on your directory. Next, click the button, click New, and then Workbook. A new workbook, Wbk2, should appear in the Library. Next, close the window to clear all designs from the Output Preview. 23.1 Difference of Two Binomials – 23.1.1 Trial Design 413 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Multiple designs for discrete outcomes East allows the user to easily create multiple designs by specifying a range of values for certain parameters in the design window. In studies with discrete outcomes, East supports the input of multiple key parameters at once to simultaneously create a number of different designs. For example, suppose in a multi-look study the user wants to generate designs for all combinations of the following parameter values in a two sample Difference of Proportions test: Power = 0.8 and 0.9, and Alternative Hypothesis - Prop. under Treatment = 0.4, 0.5 and 0.6. The number of combinations is 2 x 3 = 6. East creates all permutations using only a single specification under the Test Parameters tab in the design window. As shown below, the values for Power are entered as a list of comma separated values, while the Prop. under Treatment for the alternative hypothesis are entered as a colon separated range of values, 0.4. to 0.6 in steps of 0.1. East computes all 6 designs and displays them in the Output Preview window: East provides the capability to analyze multiple designs in ways that make comparisons between the designs visually simple and efficient. To illustrate this, a selection of a few of the above designs can be viewed simultaneously in both the Output Summary section as well as in the various tables and plots. The following is a subsection of the designs computed from the above example with differing values for number of looks, power and proportion under treatment. Designs are displayed side by side, allowing details to be easily compared. Save these designs in the newly created workbook. 414 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In addition East allows multiple designs to be viewed simultaneously either graphically or in tabular format: Stopping Boundaries (table) 23.1 Difference of Two Binomials – 23.1.1 Trial Design 415 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Error Spending (table) Stopping Boundaries (plot) 416 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Power vs. Treatment Effect (plot) This capability allows the user to explore a greater space of possibilities when determining the best choice of study design. Select individual looks With Des8 selected in Wbk1, click icon. In the Spacing of Looks table of the Boundary tab, notice that there are ticked checkboxes under the columns Stop for Efficacy and Stop for Futility. East gives you the flexibility to remove one of the stopping boundaries at certain looks, subject to the following constraints: (1) both boundaries must be included at the final two looks, (2) at least one boundary, either efficacy or futility, must be present at each look, (3) once a boundary has been selected all subsequent looks must include this boundary as well and (4) efficacy boundary for the penultimate look cannot be absent. Untick the checkbox in the first look under the Stop for Futility column. 23.1 Difference of Two Binomials – 23.1.1 Trial Design 417 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Click Recalc, and click icon to view the new boundaries. Notice that the futility boundary does not begin until the second look. Simulation Tool Let us verify the operating characteristics of Des8 from Wkbk1 through Simulations. Select Des8 in the Library, and click icon from Library toolbar. Alternatively, right-click on Des8 and select Simulate. A new Simulation worksheet will appear. Let us first verify, by running 10,000 simulated clinical trials that the type-1 error is indeed 5%. That is, we must verify that if the event rate for both the placebo and treatment (Abciximab) arms is 0.15, only about 500 of these simulations will reject H0 . Click on the Response Generation tab, and change the entry in the cell labeled Prop. Under Treatment from 0.1 to 0.15. 418 23.1 Difference of Two Binomials – 23.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Next, click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Sim1. Select Sim1 in the Output Preview. Click icon to save it to the Library. Double-click on Sim1 in the Library. The simulation output details will be displayed. In the Deatils output, notice that 487 of the 10,000 simulations rejected H0 . (This number might vary, depending on the starting seed used for the simulations.) This confirms that the type-1 error is preserved (up to Monte Carlo accuracy) by these stopping boundaries. Next, run 10,000 simulations under the alternative hypothesis H1 that the event rate for placebo is 0.15 but the event rate for Abciximab is 0.1. Right-click Sim1 in the Library and click Edit Simulation. In the Response Generation tab, enter 0.10 for Prop. Under Treatment. Leave all other values as they are, and click Simulate to create output Sim2. Select Sim2 in the Output Preview and save it to Workbook Wbk1. In the Overall Simulation Result table, notice that the lower efficacy stopping boundary was crossed in 7996 out of 10000 simulated trials, which is consistent with 80% power (up to Monte Carlo accuracy) for the original design. Moreover, 393 of these simulations were able to reject the null hypothesis at the very first look. Feel free to 23.1 Difference of Two Binomials – 23.1.1 Trial Design 419 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample experiment further with other simulation options before continuing. 23.1.2 Interim Monitoring The spending functions discussed above were for illustrative purposes only. They were not used in the actual CAPTURE trial. Instead, the investigators created their own spending function which is closely approximated by the Gamma spending function of Hwang, Shih and DeCani (1990) with parameter −4.5. The investigators then used this spending function to generate two-sided boundaries for early stopping only to reject H0 . Moreover since it was felt that the trial would enroll patients rapidly, the study was designed for three unequally spaced looks; one interim analysis after 25% enrollment, a second interim analysis after 50% enrollment, and a final analysis after all the patients had enrolled. icon. In the Boundary To design this trial, select Des2 in the Library and click tab, in the Efficacy box, set Spending Function to Gamma Family and change the Parameter (γ) to −4.5. In the Futility Box, make sure Boundary Family is set to None. Click the radio button for Unequal in the Spacing of Looks box. In the Looks Details table change the Info. Fraction to 0.25 and 0.50 for Looks 1 and 2, 420 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 respectively. Click Comptue. In the Output Preview toolbar, click icon to save this design to Wbk1 in the Library. Select Des9 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Des9 and select Interim Monitoring dashboard. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Click on the icon to invoke the Test Statistic Calculator. The first interim look was taken after accruing a total of 350 patients, 175 per treatment arm. There were 30 events on the placebo arm and 14 on the Abciximab arm. Based on these data, the event rate for placebo is 30/175 = 0.17143 and the event rate for Abciximab is 14/175 = 0.08. Hence the estimate of δ = 0.08 − 0.17143 = −0.09143. The unpooled estimate of the SE of δ̂ is r (14/175)(161/175) (30/175)(145/175) + = 0.035103. (23.7) 175 175 So the value of test statistic is δ̂ −0.09143 = = −2.60457 SE 0.035103 (23.8) We will use the test statistic calculator and specify the values of δ̂ and SE in the same. 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring 421 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample The test statistic calculator will then compute the test statistic value and post it into the interim monitoring sheet. This process will ensure that the RCI and final adjusted estimates will be computed using the estimates of δ and SE obtained from the observed data. Click on the Estimate of δ and Std. Error of δ radio button. Type in (14/175) − (30/175) for Estimate of δ. The Estimate of δ is computed as −0.091429. We can then enter the expression given by (23.7) for the Std. Error of Estimate of δ. Click on Recalc to get the Test Statistic value, then OK to continue. The top panel of the interim monitoring worksheet displays upper and lower stopping boundaries and upper and lower 95% repeated confidence intervals. The lower stopping boundary for rejecting H0 is -3.239. Since the current value of the test statistic is -2.605, the trial continues. The repeated confidence interval is (−0.205, 0.022). We thus conclude, with 95% confidence, that Abciximab arm is unlikely to increase the event rate by any more than 2.2% relative to placebo and might actually reduce the event rate by as much as 20.5%. 422 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now click on the second row in the table in the upper section. Then click the icon. A second interim look was taken after accruing a total of 700 patients, 353 on placebo and 347 on Abciximab. By this time point there were a total of 55 events on the placebo arm and 37 events on the Abciximab arm. Based on these data, the event rate for placebo is 55/353 = 0.15581 and the event rate for Abciximab is 37/347 = 0.10663. Hence the estimate of δ = 0.10663 − 0.15581 = −0.04918. The unpooled estimate of the SE of δ̂ is r (37/347)(310/347) (55/353)(298/353) + = 0.02544. (23.9) 347 353 So the value of test statistic is −0.04918 δ̂ = = −1.9332 SE 0.02544 (23.10) We will now enter the above values of δ̂ and SE in the test statistic calculator for posting the test statistic value into the interim monitoring sheet. Enter the appropriate values for Cumulative SS and Cumulative Response. Click the Recalc button. The calculator updates the fields - total sample size, δ and SE. 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring 423 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample The updated sheet is displayed below. At this interim look, the stopping boundary for early rejection of H0 is ±2.868 and the 95% repeated confidence interval is still unable to exclude a difference of zero for the two event rates. Thus the study continues. The Stopping Boundaries chart of the dashboard displays the path traced out by the test statistic in relation to the upper and lower stopping boundaries at the first two interim looks. To expand this chart to full size, click the icon located at the top right of the chart. This full-sized chart displays stopping boundaries that have been recomputed on the basis of the error spent at each look, as shown on the Error Spending chart located at the bottom left of the dashboard. To display this full-sized chart, close the current chart 424 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and click the icon on the Error Spending chart. By moving the vertical cursor from left to right on this chart we observe that 0.0012 of the total error was spent by the first interim look and 0.005 of it was spent by the second interim look. Close this chart before continuing. Although this study was designed for two interim looks and one final look, the data monitoring committee decided to take a third unplanned look after accruing 1050 patients, 532 on placebo and 518 on Abciximab. The error spending function methodology permits this flexibility. Both the timing and number of interim looks may be modified from what was proposed at the design stage. East will recompute the new stopping boundaries on the basis of the error actually spent at each look rather than the error that was proposed to be spent. There were 84 events on the placebo arm and 55 events on the Abciximab arm. Hence the estimate of δ = 0.1062 − 0.1579 = −0.05171. The unpooled estimate of the SE of δ is 0.02081. So the value of test statistic is −2.4849. Click the third row of the table in the top portion and then click the icon. Upon entering this summary information, through the test statistic calculator, into the interim 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring 425 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample monitoring sheet we observe that the stopping boundary is crossed. Press the Stop button and observe the results in the interim monitoring worksheet. The 95% repeated confidence interval is (−0.103, −0.011) and it excludes 0 thus confirming that the null hypothesis should be rejected. Once the study is terminated, East computes a final p-value, confidence interval and median unbiased point estimate, all adjusted for the multiple looks, using a stage wise ordering of the sample space as proposed by Tsiatis, Rosner and Mehta (1984). The adjusted p-value is 0.016. The adjusted confidence interval for the difference in event rates is (−0.092, −0.010) and the median unbiased estimate of the difference in event rates is −0.051. In general, the adjusted confidence interval produced at the end of the study is narrower than the final 426 23.1 Difference of Two Binomials – 23.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 repeated confidence interval although both intervals provide valid coverage of the unknown effect size. 23.1.3 Pooled versus Unpooled Designs The manner in which the data will be analyzed at the interim monitoring stage should be reflected in the study design. We stated at the beginning of this chapter that the test statistic used to track the progress of a binomial endpoint study could be computed by using either the unpooled variance or the pooled variance to standardize the difference of binomial proportions. The design of the CAPTURE trial in Section 23.1.1 and its interim monitoring in Section 23.1.2 were both performed on the basis of the unpooled statistic. In this section we examine how the design would change if we intended to use the pooled statistic for the interim monitoring. It is seen that the change in sample size is negligible if the randomization is balanced. If, however, an unbalanced randomization rule is adopted, there can be substantial sample size differences between the unpooled and pooled designs. Consider once more the design of the CAPTURE trial with a maximum of K = 3 looks, stopping boundaries generated by the Gamma(-4.5) Gamma family spending function, and 80% power to detect a drop in the event rate from 0.15 on the placebo arm to 0.1 on the Abciximab arm using a two sided level 0.05 test. We now consider the design of this trial on the basis of the pooled statistic. Select Des9 in the Library and click icon. Then under the Test Parameters tab, in the Specify Variance box, select the radio button for Pooled Estimate. Click the Compute button to create Des10. Save Des10 to Wbk1. In the Library 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs 427 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample select Des9 and Des10 by holding the Ctrl key, and then click on the icon. It is instructive to compare Des9 with Des10. It is important to remember that Des9 utilized the unpooled design while Des10 utilized the pooled design. When we compare Des9 and Des10 side by side we discover that there is not much difference in terms of either the maximum or expected sample sizes. This is usually the case for balanced designs. If, however, we were to change the value of the Allocation Ratio parameter from 1 to 0.333 (which corresponds to assigning 25% of the patients to treatment and 75% to control), then we would find a substantial difference in the sample sizes of the two plans. In the picture below, Des11 utilizes the unpooled design 428 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 while Des12 utilizes the pooled design. Notice that because of the unbalanced randomization the unpooled design is able to achieve 80% power with 229 fewer patients than the pooled design. Specifically, if we decide to monitor the study with the test statistic (23.2) we need to commit a maximum of 1908 patients (Des12), whereas if we decide to monitor the study with the test statistic (23.1) we need to commit a maximum of only 1679 patients (Des11). We can verify, by simulation that both Des11 and Des12 produce 80% power under the alternative hypothesis. After saving Des11 and Des12 in Workbook1, select Des11 in the Library and click the icon. Next, click the Simulate button. The results are displayed below and demonstrate that the null hypothesis was rejected 7710 times in 10,000 trials (77.10%), 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs 429 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample very close to the desired 80% power. Next, repeat the procedure for Design12. Observe that once again, the desired power was almost achieved. This time the null hypothesis was rejected 7916 times in 10,000 430 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 trials (79.77%), just slightly under the desired 80% power. The power advantage of the unpooled design over the pooled design gets reversed if the proportion of patients randomized to the treatment arm is 75% instead of 25%. Edit 23.1 Pooled Designs – 23.1.3 Pooled versus Unpooled Designs 431 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Des11 and Des12, and change the Allocation Ratio parameter to 3. Now the pooled design (Des14) requires a maximum of 1770 patients whereas the unpooled des (Des13) requires a maximum of 1995 patients. This shows that when planning a binomial study with unbalanced randomization, it is important to try both the pooled and unpooled designs and choose the one that produces the same power with fewer patients. The correct choice will depend on the response rates of the control and treatment arms as well as on the value of the fraction assigned to the treatment arm. 23.2 Ratio of Proportions 23.2.1 Trial Design 23.2.2 Trial Simulation 23.2.3 Interim Monitoring Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let ρ = πt /πc . We want to test the null hypothesis that ρ = 1 against one or two-sided alternatives. It is mathematically more convenient to express this hypothesis testing problem in terms of the difference of the (natural) logarithms. Thus we define δ = ln(πt ) − ln(πc ). On this metric, we are interested in testing H0 : δ = 0 against one or two-sided alternative hypotheses. Let π̂ij denote the estimate of πi based on nij observations from Treatment i, up to and including the j th look, j = 1, . . . K, i = t, c , where a maximum of K looks are to be taken. Then the estimate of δ at the j-th look is δ̂j = ln(π̂tj ) − ln(π̂cj ) 432 23.2 Ratio of Proportions (23.11) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 with estimated standard error se ˆj ={ (1 − π̂tj ) (1 − π̂cj ) 1/2 + } ntj π̂tj ncj π̂cj (23.12) if we use an unpooled estimate for the variance of δ̂ and estimated standard error se ˆj ={ (1 − π̂j ) −1 1/2 (ntj + n−1 , cj )} π̂j where π̂j = (23.13) ntj π̂tj + ncj π̂cj , ntj + ncj if we use a pooled estimate for the variance of δ̂. In general, for any twice-differentiable function h(.), with derivative h0 (.), h(π̂ij ) is approximately normal with mean h(πi ) and variance [h0 (πi )]2 πi (1 − πi )/nij for large values of nij . Using this asymptotic approximation, the test statistic at the j th look is (u) Zj = ln(π̂tj ) − ln(π̂cj ) , (1−π̂ ) 1/2 (1−π̂tj ) } { ntj π̂tj + ncj π̂cj cj (23.14) i.e. the ratio of (23.11) and (23.12) , if we use an unpooled estimate for the variance of ln(π̂tj ) − ln(π̂cj ) and (p) Zj = ln(π̂tj ) − ln(π̂cj ) (1−π̂j ) −1 1/2 { π̂j (n−1 tj + ncj )} , (23.15) i.e. the ratio of (23.11) and (23.13), if we use a pooled estimate for the variance of ln(π̂tj ) − ln(π̂cj ). 23.2.1 Trial Design Design objectives and interim results were presented from PRISM, a prospective randomized trial of Heparin alone (control arm), Tirofiban alone (monotherapy arm), and Heparin plus Tirofiban (combination therapy arm), at a DIA Workshop on Flexible Trial Design (Snappin, 2003). The composite endpoint was refractory ischemia, myocardial infact or death within seven days of randomization. The investigators were interested in comparing the two Tirofiban arms to the control arm with each test being conducted at the 0.025 level of significance (two sided). It was assumed that the control arm has a 30% event rate. Thus, πt = πc = 0.3 under H0 . The investigators wished to determine the sample size to have power of 80% if there was a 25% decline 23.2 Ratio of Proportions – 23.2.1 Trial Design 433 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample in the event rate, i.e. πt /πc = 0.75. It is important to note that the power of the test depends on πc and πt , not just the ratio, so different values of the pair (πc , πt ) with the same ratio will have different solutions. We will now design a two-arm study that compares the control arm, Heparin, to the combination therapy arm, Heparin plus Tirofiban. First click Design tab, then Two Samples on the Discrete group, and then click Ratio of Proportions. We want to determine the sample size required to have power of 80% when πc =0.3 and ρ = πt /πc =0.75, using a two-sided test with a type 1 error rate of 0.025. Single-Look Design- Unpooled Estimate of Variance First consider a study with only one look and equal sample sizes in the two groups. Select the input parameters as displayed below. We will use the test statistic (23.14) with the unpooled estimate of the variance. Click the Compute button. The design Des1 is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined 434 23.2 Ratio of Proportions – 23.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 total of 1328 subjects from both treatments in order to attain 80% power. You can select this design by clicking anywhere on the row in the Output Preview. If you click , some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click wbk1 in the Library. icon to save this design to Workbook Three-Look Design - Unpooled Estimate of Variance For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan-DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Des1 in the Library, and clicking the icon. In the input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary will appear. Click this tab to reveal the stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming) 23.2 Ratio of Proportions – 23.2.1 Trial Design 435 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample stopping boundary and equal spacing of looks are selected. Click Compute to create design Des2. The results of Des2 are shown in the Output Preview window. With Des2 selected in the Output Preview, click icon. In the Library, select the nodes for both Des1 and Des2, by holding the Ctrl key, and then click the side-by-side: 436 icon. The upper pane will display the details of the two designs 23.2 Ratio of Proportions – 23.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Although, the maximum sample size has increased from 1328 to 1339, using three planned looks may result in a smaller sample size than that required for the single-look design, with an expected sample size of 1168 subjects under the alternative hypothesis (πc = 0.3, ρ = 0.75), and still ensures that the power is 80%. Additional information can also be obtained from Des2. The Lan-DeMets spending function corresponding to the O’Brien-Fleming boundary can be viewed by selecting Des2 in the Library, clicking on the icon, and selecting Stopping Boundaries. The following chart will appear: The alpha-spending function can be viewed by selecting Des2 in the Library, clicking 23.2 Ratio of Proportions – 23.2.1 Trial Design 437 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample on the icon, and selecting Error Spending. In order to see the stopping probabilities, as well as other characteristics, select Des2 in the Library, and click the icon. The cumulative boundary stopping probabilities are shown in the Stopping Boundaries table. Close this window before continuing. Three-Look Design - Pooled Estimate of Variance We now consider this design using the statistic (23.15) with the pooled estimate of the variance. Create a new icon. Under the Test design by selecting Des2 in the Library, and clicking the Parameters tab, select the radio button for Pooled Estimate in the Variance of 438 23.2 Ratio of Proportions – 23.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Standardized Test Statistic box. Leave everything else unchanged. Click the Compute button to generate the output for Des3. Save Des3 by selecting it in the Output Preview and clicking the icon. In the Library, select the nodes for Des1, Des2, and Des3, by holding the Ctrl Key, and then click the upper pane will display the details of the three designs side-by-side: icon. The For this problem, the test statistic (23.14) with the unpooled estimate of the variance requires a smaller sample size than the test statistic (23.15) with the pooled estimate of the variance. Close this window before continuing. 23.2.2 Trial Simulation Suppose we want to see the impact of πt on the behavior of the test statistic (23.14) with the unpooled estimate of the variance. First we consider πt = 0.225 as specified by the alternative hypothesis. With Des2 selected in the Library, click the icon. Click on the Simulate button. The results of the simulation will appear under Sim1 in the Output Preview. Select Sim1 in the Output Preview and click icon to save it to Wbk1. Double-click on Sim1 in the Library to display the results of the simulation. Although the actual values may differ, we see that the power is 23.2 Ratio of Proportions – 23.2.2 Trial Simulation 439 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample approximately 80% and the probability of stopping early is about 0.37. Now we consider πt = 0.25, which will provide us with the impact if we were too optimistic about the treatment effect. Select Sim1 in the Library and click the icon. Under the Response Generation tab, enter the value of 0.25 next to Prop. Under Treatment (πt1 ). Click Simulate button. Although the actual values may 440 23.2 Ratio of Proportions – 23.2.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 differ, we see that the power is approximately 41%. 23.2.3 Interim Monitoring Consider interim monitoring of Des2. Select Des2 in the Library, and click the icon from the Library toolbar. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. Suppose that the results are to be analyzed after results are available for every 450 icon in the upper left to invoke the Test subjects. Click on Statistics Calculator. Select the radio-button to enter δ̂ and its standard error. Enter 450 in the box next to Cumulative Sample Size. Suppose that after the data were available for first 450 subjects, 230 subjects were randomized to the control arm (c) and 220 subjects were randomized to the treatment arm (t). Of the 230 subjects in the control arm, there were 65 events; of the 220 subjects in the treatment arm, there were 45 events. In the box next to Estimate of δ enter: ln((45/220)/(65/230)) and then hit Enter. EAST will compute the estimate of δ. Enter 0.169451 in the box next to Std. 23.2 Ratio of Proportions – 23.2.3 Interim Monitoring 441 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Error of δ. Next click Recalc. You should now see the following: Next, click OK. The following table will appear in the top section of the IM Dashboard. Note - Click on icon to hide or unhide the columns of your interest. RCI for δ. Keeping all the four boxes checked can display RCI on both the scales. The boundary was not crossed as the value of the test statistic Test Statistic is -1.911, which is within the boundaries (-4.153, 4.153), so the trial continues. After data were available for an additional 450 subjects, the second analysis is performed. Suppose that among the 900 subjects, 448 were randomized to control (c) and 452 were randomized to (t). Of the 448 subjects in the control arm, there were 132 events; of the 452 subjects in the treatment arm, there were 90 events. Click on the second row in the table in the upper section. Then click 442 23.2 Ratio of Proportions – 23.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon. Enter 900 box next to Sample Size (Overall). Then in the box next to Estimate of δ enter: ln((90/452)/(132/448)). Next hit Enter, then enter 0.119341 in the box next to Std. Error of δ. Click Recalc then OK. The value of the test statistic is -3.284, which is less than -2.833, the value of the lower boundary, so the following dialog box appears. Click on Stop to stop any further analyses. The Final Inference Table shows that the adjusted point estimate of ln(ρ) is -0.392 (p = 0.001) and the final adjusted 97.5% confidence interval for ln(ρ) is (-0.659, -0.124). 23.2 Ratio of Proportions – 23.2.3 Interim Monitoring 443 <<< Contents 23 23.3 * Index >>> Binomial Superiority Two-Sample Odds Ratio of Proportions Let πt and πc denote the two binomial probabilities associated with the treatment and the control, respectively. Furthermore, let the odds ratio be 23.3.1 Trial Design 23.3.2 Trial Simulation 23.3.3 Interim Monitoring ψ= πt (1 − πc ) πt /(1 − πt ) = . πc /(1 − πc ) πc (1 − πt ) (23.16) We are interested in testing H0 : ψ = 1 against the two-sided alternative H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ < 1 or H1 : ψ > 1. It is convenient to express this hypothesis testing problem in terms of the (natural) logarithm of ψ. Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the treatment and the control, respectively, up to and including the j th look, j = 1, . . . , K, where a maximum of K looks are to be made. The difference between treatments at the j th look is assessed using δ̂j = ln(π̂tj /(1 − π̂tj )) ln(π̂cj /(1 − π̂cj )). (23.17) Using the asymptotic approximation presented in section 23.2, the estimate of the standard error of δˆj at the j th look is se ˆ j = {1/ntj π̂tj (1 − π̂tj ) + 1/ncj π̂cj (1 − π̂cj )}1/2 , (23.18) and the test statistic at the j-th look is the ratio of δˆj , given by (23.17), and the estimate of the standard error of δj , given by (23.18), namely, Zj = 23.3.1 ln(π̂tj /(1 − π̂tj )) − ln(π̂cj /(1 − π̂cj )) . {1/ntj π̂tj (1 − π̂tj ) + 1/ncj π̂cj (1 − π̂cj )}1/2 (23.19) Trial Design Suppose that the response rate for the control treatment is 10% and we hope that the experimental treatment can triple the odds ratio; that is, we desire to increase the response rate to 25%. Although we hope to increase the odds ratio, we solve this problem using a two-sided testing formulation. The null hypothesis H0 : ψ = 1 is tested against the two-sided alternative H1 : ψ 6= 1. The power of the test is computed at specified values of πc and ψ. Note that the power of the test depends on πc and ψ, or equivalently πc and πt , not just the odds ratio. Thus, different values of πc with the same odds ratio will have different solutions. First, click Design tab, then click Two Samplesin the Discrete group, and then click 444 23.3 Binomial Odds Ratio – 23.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Odds Ratio of Proportions. Suppose we want to determine the sample size required to have power of 80% when πc = 0.1 and ψ1 = 3 using a two-sided test with a type-1 error rate of 0.05. Single-Look Design First consider a study with only one look and equal sample sizes in the two groups. Enter the appropriate design parameters so that the dialog box appears as shown. Then click Compute. The design Des1 is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined total of 214 subjects from 23.3 Binomial Odds Ratio – 23.3.1 Trial Design 445 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample both treatments in order to attain 80% power. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the Library. icon to save this design to Wbk1 in the Three-Look Design For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Des1 in the Library, and clicking icon. In the input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary will appear. Click this tab to reveal the stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming) stopping boundary and equal spacing of 446 23.3 Binomial Odds Ratio – 23.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 looks are selected. Click Compute button to design Des2. The results of Des2 are shown in the Output Preview window. With Des2 selected in the Output Preview, click the icon. In the Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs 23.3 Binomial Odds Ratio – 23.3.1 Trial Design 447 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample side-by-side: Using three planned looks may result in a smaller sample size than that required for the single-look design, with an expected sample size of 186 subjects under the alternative hypothesis (πc = 0.1, ψ = 3), and still ensures the power is 80%. Additional information can also be obtained from Des2. The Lan-DeMets spending function corresponding to the O’Brien-Fleming boundary can be viewed by selecting Des2 in the Library, clicking on the icon, and selecting Stopping Boundaries. 448 23.3 Binomial Odds Ratio – 23.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The following chart will appear: The alpha-spending function can be viewed by selecting Des2 in the Library, clicking 23.3 Binomial Odds Ratio – 23.3.1 Trial Design 449 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample on the icon, and selecting Error Spending. In order to see the stopping probabilities, as well as other characteristics, select Des2 in icon. The cumulative boundary stopping the Library, and click the probabilities are shown in the Stopping Boundaries table. East displays the stopping boundary, the type-1 error spent and the boundary crossing probabilities under H0 : πc = 0.1, ψ = 1 and the alternative hypothesis H1 : πc = 0.1, ψ = 3. Close this window before continuing. 450 23.3 Binomial Odds Ratio – 23.3.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 23.3.2 Trial Simulation Suppose we want to see the impact of πt on the behavior of the test statistic (23.19). First we consider πt = 0.25 as specified by the alternative hypothesis. With Des2 selected in the Library, click icon. Next, click Simulate button. The results of the simulation will appear under Sim1 in the Output Preview. Highlight Sim1 in the Output Preview and click icon to save it to workbook Wbk1. Double-click on Sim1 in the Library to display the results of the simulation. Although your results may differ slightly, we see that the power is approximately 83% and the probability of stopping early is about 0.39. Now we consider πt = 0.225, which will provide us with the impact if we were too icon. optimistic about the treatment effect. Select Sim1 in the Library and click Under the Response Generation tab, enter the value of 0.225 next to Prop. Under Treatment (πt ). Click Simulate. Although, the actual values may differ, we see that the power is approximately 68% and the probability of stopping early is about 0.26. 23.3.3 Interim Monitoring Consider interim monitoring of Des2. Select Des2 in the Library, and click 23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring 451 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample icon from the Library toolbar. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. Suppose that the results are to be analyzed after results are available for every 70 subjects. Click on icon in the upper left to invoke the Test Statistics Calculator. Select the second radio button on the calculator to enter values of δ̂ and its standard error. Before that enter 70 in the box next to Cumulative Sample Size. Suppose, after the data were available for first 70 subjects, 35 subjects were randomized to the control arm (c), of whom 5 experienced a response, and 35 subjects were randomized to the treatment arm (t), of whom 9 subjects experienced a response. In the box next to Estimate of δ enter 0.730888 and in the box next to Std. Error of δ enter 0.618794. Next click Recalc. You should now see the following: 452 23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click OK and the following entry will appear in the top section of the IM Dashboard. Note - Click on icon to hide or unhide the columns of your interest. The boundary was not crossed, as value of the test statistic (1.181) is within the boundaries (−3.777, 3.777), so the trial continues. After data were available for an additional 70 subjects, the second analysis was performed. Suppose that among the 140 subjects, 71 were randomized to c and 69 were randomized to t. Click on the second row in the table in the upper section. Then click icon. Enter 140 in the box next to Cumulative Sample Size. Then in the box next to Estimate of δ enter: 1.067841 and in the box next to Std. Error of δ enter: 0.414083. Next, click on Recalc then OK. The test statistic 2.579 exceeds the upper boundary (2.56), so the following screen appears. Click Stop to halt any further analyses. The Final Inference Table shows that the adjusted point estimate of ln(ψ) is 1.068 (p = 0.01) and the adjusted 95% confidence 23.3 Binomial Odds Ratio – 23.3.3 Interim Monitoring 453 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample interval for ln(ψ) is (0.256, 1.879). 23.4 Common Odds Ratio of Stratified Tables 23.4.1 Trial Design 23.4.2 Interim Monitoring Some experiments are performed with several disjoint groups (strata) within each treatment group. For example, multicenter clinical trials are conducted using several investigator sites. Other situations include descriptive subsets, such as baseline and demographic characteristics. Let πtg and πcg denote the two binomial probabilities in Group g, g = 1, . . . , G, for the treatment and control, respectively. It is assumed that the odds ratio πtg /(1πtg ) πtg (1πcg ) ψ= = (23.20) πcg /(1πcg ) πcg (1πtg ) is the same for each group (stratum). The Cochran-Mantel-Haensel test is used for testing H0 : ψ = 1 against the two-sided alternative H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ > 1 or H1 : ψ < 1. Let π̂tjg and π̂cjg denote the estimates of πt and πc based on ntjg and ncjg observations in Group g from the treatment (t) and the control (c), respectively, up to and including the j th look, j = 1, . . . K, where a maximum of K looks are to be taken. Then the estimate of δ = ln(ψ) from the g-th group at the j-th look is δ̂jg = ln( π̂tjg π̂cjg ) ln( ). 1π̂tjg 1π̂cjg Then the estimate of δ = ln(ψ) at the j-th look is the average of δ̂jg , g = 1, . . . , G; 454 23.4 Common Odds Ratio <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 namely, G P δ̂j = δ̂jg g=1 G or, equivalently, G P δˆj = g=1 π̂ π̂ ) ln( 1π̂cjg )) (ln( 1π̂tjg tjg cjg G . (23.21) The estimate of the standard error of δ̂jg at the j th look is se ˆ jg = { 1 1 + }1/2 . ntjg π̂tjg (1 − π̂tjg ) ncjg π̂cjg (1π̂cjg ) The estimated variance of δ̂ at the j-th look is the average of the variances of δ̂jg , g = 1, . . . , G. Thus, G P se ˆ 2jg g=1 }1/2 . se ˆj ={ G The test statistic used at the j-th look is Zj = 23.4.1 δ̂j . se ˆj (23.22) (23.23) Trial Design First consider a simple example with two strata, such as males and females, with an equal number of subjects in each stratum and the same response rate of 60% for the control in each stratum. We hope that the experimental treatment can triple the odds ratio. Although we hope to increase the odds ratio, we solve this problem using a two-sided testing formulation. The null hypothesis H0 : ψ = 1 is tested against the two-sided alternative H1 : ψ 6= 1. The power of the test is computed at specified values of πcg , g = 1, . . . , G, and ψ. To begin, click Design tab, then click Two Samples in the Discrete group, and then 23.4 Common Odds Ratio – 23.4.1 Trial Design 455 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample click Common Odds Ratio for Stratified 2 x 2 Tables. Suppose that we want to determine the sample size required to have power of 80% when πc1 = πc2 = 0.6 and ψ = 3 using a two-sided test with a type-1 error rate of 0.05. Single-Look Design - Equal Response Rates First consider a study with only one look and equal sample sizes in the two groups. Enter the appropriate test parameters so that the dialog box appears as shown. Then click Compute. The design is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined total of 142 subjects from both treatments in order to attain 80% power. 456 23.4 Common Odds Ratio – 23.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click Wbk1 in the Library. icon to save this design to workbook Single-Look Design - Unequal Response Rates Now, we consider a more realistic clinical trial. Suppose that males and females respond differently, so that the response rate for males is πc1 = 0.6 and the response rate for females is πc2 = 0.3. First, we consider a study without any interim analyses. Create a new design by selecting Des1 in the Library, and clicking the Change πc2 in the Stratum Specific Input table to 0.3 as shown below. icon. Click Compute to create design Des2. The results of Des2 are shown in the Output icon. In the Preview window. With Des2 selected in the Output Preview, click Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then 23.4 Common Odds Ratio – 23.4.1 Trial Design 457 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample click the side-by-side: icon. The upper pane will display the details of the two designs This single-look design requires a combined total of 127 subjects from both treatments in order to attain 80% power. Three-Look Design - Unequal Response Rates For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Des2 in the Library, and clicking the icon. In the input, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab for Boundary will appear. Click this tab to reveal the stopping boundary parameters. By default, the Lan-DeMets (O’Brien-Fleming) stopping boundary and equal spacing of looks are selected. 458 23.4 Common Odds Ratio – 23.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the Compute button to generate output for Des3. The results of Des3 are shown in the Output Preview window. With Des3 selected in the Output Preview, click icon. In the Library, select the nodes for Des1, Des2, and Des3 by holding the Ctrl Key, and then click the designs side-by-side: icon. The upper pane will display the details of the three Using three planned looks requires an up-front commitment of 129 subjects, a slight increase over the single-look design, which required 127 subjects. However, the three look design may result in a smaller sample size than that required for the single look design, with an expected sample size of 111 subjects under the alternative hypothesis (πc1 = 0.6, πc2 = 0.3, ψ = 3), and still ensures that the power is 80%. icon, East displays the By selecting only Des3 in the Library and clicking stopping boundary, the type-1 error spent and the boundary crossing probabilities under H0 : πc1 = 0.6, πc2 = 0.3, ψ = 1 and the alternative hypothesis 23.4 Common Odds Ratio – 23.4.1 Trial Design 459 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample H1 : πc1 = 0.6, πc2 = 0.3, ψ = 3. Close this window before continuing. Three-Look Design - Unequal Response Rates - Unequal Strata Sizes Some disorders have different prevalence rates across various strata. Consider the above example, but with the expectation that 30% of the subjects will be males and 70% of the subjects will be females. Create a new design by selecting Des3 in the Library, and clicking the icon. Under the Test Parameters tab in the Stratum Specific Input box select the radio button Unequal. You can now edit the Stratum Fraction column for Stratum 1. Change this value from 0.5 to 0.3 as shown below. Click the Compute button to generate output for Des4. The results of Des4 are shown in the Output Preview window. With Des4 selected in the Output Preview, click the icon. In the Library, select the rows for Des1, Des2, Des3, and Des4 by holding the Ctrl key, and then click 460 icon. The upper pane will display the details of the 23.4 Common Odds Ratio – 23.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 four designs side-by-side: Note that, for this example, unequal sample sizes for the two strata result in a smaller total sample size than that required for equal sample sizes for the two strata. 23.4.2 Interim Monitoring Consider interim monitoring of Des4. Select Des4 in the Library, and click the icon from the Library toolbar. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. Suppose that the results are to be analyzed after results are available for every 40 icon in the upper left to invoke the Test subjects. Click on the Statistics Calculator. Enter 40 in the box next to Cumulative Sample Size. Suppose that δ̂1 = 0.58 and se ˆ 1 = 0.23. Enter these values and click on Recalc. You should 23.4 Common Odds Ratio – 23.4.2 Interim Monitoring 461 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample now see the following: Click OK and the following table will appear in the top section of the IM Dashboard. The boundary was not crossed, as value of the test statistic (2.522) is within the boundaries (-3.777, 3.777), so the trial continues. Click on the second row in the table in the upper section. Then click the icon. Enter 80 in the box next to Cumulative Sample Size. Suppose that δ̂2 = 0.60 and se ˆ 2 = 0.21. Enter these 462 23.4 Common Odds Ratio – 23.4.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 values and click Recalc. You should now see the following: Click the OK button. The test statistic 2.857 exceeds the upper boundary (2.56), so the following dialog box appears. Click Stop to stop any further analyses. The Final Inference Table shows the adjusted point estimate of ln(ψ) is 0.600 (p = 0.004) and the adjusted 95% confidence interval 23.4 Common Odds Ratio – 23.4.2 Interim Monitoring 463 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample for ln(ψ) is (0.188, 1.011). 23.5 Fisher’s Exact Test (Single Look) 23.5.1 Trial Design In some experimental situations, the normal approximation to the binomial distribution may not be appropriate, such as the probabilities of interest are large or small. This may lead to incorrect p-values, and thus the incorrect conclusion. For this reason, Fisher’s exact test may be used. Let πt and πc denote the two response probabilities for the treatment and the control, respectively. Interest lies in testing H0 : πt = πc against the two-sided alternative H1 : πt 6= πc . Results are presented here only for the situation where there is a single analysis; that is, no interim analysis, for the two-sided test with equal sample sizes for the two treatments. Let π̂t and π̂c denote the estimates of πt and πc , respectively, based on nt = nc = 0.5N observations from the treatment (t) and the control (c). The parameter of interest is δ = πt − πc , which is estimated by δ̂ = π̂t − π̂c . The estimate of the standard error used in the proposed test statistic uses of the pooled estimate of the common value of πt and πc under H0 , given by se ˆ = 2{π̂(1 − π̂)}1/2 , N (23.24) where π̂ = 0.5(π̂t + π̂c ). Incorporating a continuity correction factor, the test statistic is Z= 23.5.1 |δ̂|2/N . se ˆ (23.25) Trial Design Consider the example where the probability of a response for the control is 5% and it is 464 23.5 Fisher’s Exact Test – 23.5.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hoped that the experimental treatment can increase this rate to 25%. First, in the Discrete area, click Two Samples on the Design tab, and then click Fisher Exact Test. Suppose we want to determine the sample size required to have power of 90% when πc = 0.05 and πt = 0.25 using a two-sided test with a type-1 error rate of 0.05. Enter the appropriate test parameters so that the dialog box appears as shown. Then click Compute. The design is shown as a row in the Output Preview, located in the lower pane of this window. This single-look design requires a combined total of 136 subjects from both treatments in order to attain 90% power. 23.5 Fisher’s Exact Test – 23.5.1 Trial Design 465 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample You can select this design by clicking anywhere along the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the in the Library. icon to save this design to Workbook1 Suppose that this sample size is larger than economically feasible and it is desired to evaluate the power when a total of 100 subjects are enrolled. Create a new design by selecting Des1 in the Library, and clicking the icon. In the input, select the radio button in the box next to Power. The box next to Power will now say Computed, since we wish to compute power. In the box next to Sample Size (n) enter 100. Click Compute to create design Des2. The results of Des2 are shown in the Output Preview window. With Des2 selected in the Output Preview, click the icon. In the Library, select the rows for both Des1 and Des2, by holding the Ctrl key, and then click the 466 icon. The upper pane will display the details of the two designs 23.5 Fisher’s Exact Test – 23.5.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 side-by-side: Des2 yields a power of approximately 75% as shown. Noting that 100 subjects is economically feasible and yields reasonable power, the question arises as to the sample size required to have 80%, which might still be economically feasible. This can be accomplished by selecting Des1 in the Library, and clicking the icon. In the input, change the Power from 0.9 to 0.8. Click Compute to generate the output for Des3. The results of Des3 are shown in the Output Preview window. With Des3 selected in the Output Preview, click the icon. In the Library, select the rows for both Des1, Des2, and Des3 by holding the Ctrl key, and then click the The upper pane will display the details of the three designs side-by-side: 23.5 Fisher’s Exact Test – 23.5.1 Trial Design icon. 467 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample Entering 0.8 for the power yields a required sample size of 110 subjects. 23.6 Assurance (Probability of Success) Assurance, or probability of success, is a Bayesian version of power, which corresponds to the (unconditional) probability that the trial will yield a statistically significant result. Specifically, it is the prior expectation of the power, averaged over a prior distribution for the unknown treatment effect (see O’Hagan et al., 2005). For a given design, East allows you to specify a prior distribution, for which the assurance or probability of success will be computed. First, enter the following values in the Input window: A 3-look design for testing the difference in proportions of two distinct populations with Lan-DeMets(OF) efficacy only boundary, Superiority Trial, 1-sided test, 0.025 type-1 error, 80% power, πc = 0.15, and πt = 0.1. Select the Assurance checkbox in the Input window. The following options will appear as below. To address our uncertainty about the treatment proportion, we specify a prior distribution for πt . In the Distribution list, click Beta, and in the Input Method list, click Beta Parameters (a and b). Enter the values of a = 11 and b = 91. Recall that a−1 the mode of the Beta distribution is a+b−2 . Thus, these parameter values generate a Beta distribution that is peaked at 0.1, which matches the assumed treatment 468 23.6 Assurance (Probability of Success) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 proportion. Click Compute. The computed probability of success (0.597) is shown above. Note that for this prior, assurance is very less than the specified power (0.8); incorporating the uncertainty about πt has yielded a much less optimistic estimate of power. Save this design in the Library and rename it as Bayes1. 23.6 Assurance (Probability of Success) 469 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample East also allows you to specify an arbitrary prior distribution through a CSV file. In the Distribution list, click User Specified, and then click Browse... to select the CSV file where you have constructed a prior. If you are specifying a prior for one parameter only (either πc or πt , but not both), then the CSV file should contain two columns, where the first column lists the grid points 470 23.6 Assurance (Probability of Success) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for the parameter of interest, and the second column lists the prior probability assigned to each grid point. If you are specifying priors for both πc and πt , the CSV file should contain four columns (from left to right): values of πc , probabilities for πc , values of πt , and probabilities for πt . The number of points for πc and number of points for πt may differ. For example, we consider a 5-point prior for πt only, with probability = 0.2 at each point. Once the CSV filename and path has been specified, click Compute to calculate the assurance, which will be displayed in the box below: As stated in O’Hagan et al. (2005, p.190): “Assurance is a key input to decision-making during drug development and provides a reality check on other methods of trial design.” Indeed, it is not uncommon for assurance to be much lower than the specified power. The interested reader is encouraged to refer to O’Hagan et al. for further applications and discussions on this important concept. 23.7 Predictive Power and Bayesian Predictive Power Similar Bayesian ideas can be applied to conditional power for interim monitoring. Rather than calculating conditional power for a single assumed value of the treatment effect, δ, such as at δ̂, we may account for the uncertainty about δ by taking a weighted average of conditional powers, weighted by the posterior distribution for δ. East calculates an average power, called the predictive power (Lan, Hu, & Proschan, 2009), assuming a diffuse prior for the drift parameter, η. In addition, if the user specified a beta prior distribution at the design stage to calculate assurance, then East will also calculate the average power, called Bayesian predictive power, for the corresponding posterior. We will demonstrate these calculations for the design renamed as Bayes1 earlier. 23.7 Predictive Power and Bayesian Predictive Power 471 <<< Contents 23 * Index >>> Binomial Superiority Two-Sample In the Library, right-click Bayes1 and click Interim Monitoring, then click the toolbar of the IM Dashboard. in In the Show/Hide Columns window, make sure to show the columns for: CP (Conditional Power), Predictive Power, Bayes Predictive Power, Posterior Distribution of πt a, and Posterior Distribution of πt b, and click OK. The following columns will be added to the main grid of the IM Dashboard. In the toolbar of the IM Dashboard, open the Test Statistic Calculator by clicking . In order to appropriately update the posterior distribution, you will need to use the Test Statistic Calculator to enter the sample size and number of responses for each arm. Enter 34 events out of 230 patients in the control arm, and 23 472 23.7 Predictive Power and Bayesian Predictive Power <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 out of 231 patients in the treatment arm, then click OK. The main grid of the IM Dashboard will be updated as follows. In particular, notice the differing values for CP and the Bayesian measures of power. 23.7 Predictive Power and Bayesian Predictive Power 473 <<< Contents * Index >>> 24 Binomial Non-Inferiority Two-Sample In a binomial non-inferiority trial the goal is to establish that the response rate of an experimental treatment is no worse than that of an active control, rather than attempting to establish that it is superior. A therapy that is demonstrated to be non-inferior to the current standard therapy for a particular indication might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. Non-inferiority trials are designed by specifying a non-inferiority margin. The amount by which the response rate on the experimental arm is worse than the response rate on the control arm must fall within this margin in order for the claim of non-inferiority to be sustained. In this chapter, we shall design and monitor non-inferiority trials in which the non-inferiority margin is expressed as either a difference, a ratio, or an odds ratio of two binomial proportions. The difference is examined in Section 24.1. This is followed by two formulations for the ratio: the Wald formulation in Section 24.2 and the Farrington-Manning formulation in Section 24.3. The odds ratio formulation is presented in Section 24.4. 24.1 Difference of Proportions 24.1.1 Trial Design 24.1.2 Trial Simulation 24.1.3 Interim Monitoring Let πc and πt denote the response rates for the control and experimental treatments, respectively. Let δ = πt − πc . The null hypothesis is specified as H0 : δ = δ0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then δ0 > 0 and the alternative hypothesis is H1 : δ < δ0 or equivalently as H1 : πt > πc − δ0 . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then δ0 < 0 and the alternative hypothesis is H1 : δ > δ 0 or equivalently as H1 : πt < πc − δ0 . For any given πc , the sample size is determined by the desired power at a specified value of δ = δ1 . A common choice is δ1 = 0 (or equivalently πt = πc ) but East permits you to power the study at any value of δ1 which is consistent with the choice of H1 . 474 24.1 Difference of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including j-th look, j = 1, . . . K, where a maximum of K looks are to be made. The test statistic at the j-th look is δ̂j − δ0 (24.1) Zj = se(δ̂j ) where δ̂j = π̂cj − π̂tj and s se(δ̂j ) = 24.1.1 π̂cj (1 − π̂cj ) π̂tj (1 − π̂tj ) + . ncj ntj (24.2) (24.3) Trial Design The 24-week disease-free rate with a standard therapy for HIV is 80%. Suppose that the claim of non-inferiority for an experimental therapy can be sustained if its response rate is greater than 75%; i.e., the non-inferiority margin is δ0 = 0.05. For studies of this type, we specify inferiority as the null hypothesis, non-inferiority as the alternative hypothesis, and attempt to reject the null hypothesis using a one-sided test. We will specify to East that, under the null hypothesis H0 , πc = 0.8 and πt = 0.75. We will test this hypothesis with a one-sided level 0.05 test. Suppose we require 90% power at the alternative hypothesis, H1 , that both response rates are equal to the null response rate of the control arm, i.e. δ1 = 0. Thus, under H1 , we have πc = πt = 0.8. To begin click Two Samples on the Design tab in the Discrete group, and then click Difference of Proportions. inxxnon-inferiority,binomial Single-Look Design Powered at δ = 0 To begin with, suppose we will design a single-look study for rejection of H0 only, with 90% power at a 0.025 significance level. Enter the relevant parameters into the dialog box as shown below. In the drop 24.1 Difference of Proportions – 24.1.1 Trial Design 475 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample down box next to Trial be sure to select Noninferiority. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. The single-look design requires a combined total of 2690 patients on both arms in order to attain 90% power. We can, however, reduce the expected sample size without any loss of power if we use a group sequential design. This is considered next. Before continuing we will save Design1 to the Library. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a tooltip will appear that 476 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 summarizes the input parameters of the design. Three-Look Design Powered at δ = 0 For the above study, suppose we wish to take up to two interim looks and one final look at the accruing data. Create a new design by icon on the Library toolbar. selecting Design1 in the Library, and clicking the Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming (1979). Now suppose, in our example, that the three looks are unequally spaced, with the first look being taken after 50% of the committed accrual, and the second look being taken when after 75% of the committed accrual. Under Spacing of Looks in the Boundary Info tab, click the Unequal radio button. The column titled Info. Fraction in the Look Details table can be edited to modify the relative spacing of the analyses. The information fraction refers to the proportion of the maximum (yet unknown) sample size. By default, this table displays equal spacing. Enter the new information fraction values as shown below and click Recalc to see the updated values of the stopping 24.1 Difference of Proportions – 24.1.1 Trial Design 477 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample boundaries populated in the Look Details table. On the Boundary Info tab, you may also click the 478 24.1 Difference of Proportions – 24.1.1 Trial Design or icons to view plots <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of the error spending functions, or stopping boundaries, respectively. Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In theLibrary, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs 24.1 Difference of Proportions – 24.1.1 Trial Design 479 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample side-by-side: Let us examine the design output from Design2. The maximum number of subjects that we must commit to this study in order to achieve 90% power is 2740. That is 50 patients more than are needed for Design1. However, since Design1 is a single-look design, there is no prospect of saving resources if indeed H1 is true and the two treatments have the same response rates. In contrast, Design2 permits the trial to stop early if the test statistic crosses the stopping boundary. For this reason, the expected sample size under H1 is 2094, a saving of 596 patients relative to Design1. If H0 is true, the expected sample size is 2732 and there is no saving of patient resources. In order to see the stopping probabilities, as well as other characteristics, select Design2 in the Library, and click the 480 icon. The cumulative boundary stopping 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probabilities are shown in the Stopping Boundaries table. To display a chart of average sample number (ASN) versus the effect size, πt − πc , select Design2 in the Library and click on the icon and select Average Sample Number (ASN). To display a chart of power versus treatment size, select Design2 in the Library and click on the icon and select Power vs. Treatment Effect (δ). In Design2, we utilized Lan-DeMets (Lan & DeMets, 1983) spending function, with Parameter OF (O’Brien-Fleming to generate the stopping boundary for early stopping under H1 . One drawback of Design2 is the large expected sample size if H0 is true. We can guard against this eventuality by introducing a futility boundary which will allow us to stop early if H0 is true. A popular approach to stopping early for futility is to compute the conditional power at each interim monitoring time point and stop the study if this quantity is too low. This approach is somewhat arbitrary since there is no guidance as to what constitutes low conditional power. In East, we compute futility boundaries that protect β, the type-2 error, so that the power of the study will not deteriorate. This is achieved by using a β-spending function to generate the futility boundary. Thereby the type-2 error will not exceed β and the power of the study will be preserved. This approach was published by Pampallona and Tsiatis (1994). Suppose we now wish to include a futility boundary. To design this trial select Design2 icon. In the Boundary Info tab, in the Futility in the Library and click the box, set Boundary Family to Spending Function. Change the Spending Function to Gamma Family and change the Parameter (Γ) to −8. This family is parameterized by the single parameter γ which can take all possible non-zero values. 24.1 Difference of Proportions – 24.1.1 Trial Design 481 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Its functional form is β(t) = β(1 − e−γt ) . (1 − e−γ ) (24.4) Next click Refresh Boundary. Your screen should now look like the following: On the Boundary Info tab, you may also click the or icons to view plots of the error spending functions, or stopping boundaries, respectively. Notice how conservative the β-spending function is compared to the α-spending 482 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 function. Its rate of error spending is almost negligible until about 60% of the information has accrued. One can view the stopping boundaries on various alternative scales by selecting the appropriate scale from the drop-down list of boundary scales to the right of the chart. It is instructive to view the stopping boundaries on the p-value scale. 24.1 Difference of Proportions – 24.1.1 Trial Design 483 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample By moving the vertical scroll bar from left to right in the above chart, one can observe the p-values required for early stopping at each look. The p-values needed to stop the study and declare non-inferiority at the first, second and third looks are, respectively, 0.0015, 0.0092 and 0.022. The p-values needed to stop the study for futility at the first and second looks are, respectively, 0.7244 and 0.2708. Other useful scales for displaying the futility boundary are the conditional power scales. They are the cp delta1 Scaleand the cp deltahat scale. Here ‘cp’ refers to conditional power. The suffix ‘delta1’ implies that we will represent the futility boundary in terms of conditional power evaluated at the value of δ = δ1 specified at the design stage under the alternative hypothesis. The suffix ‘deltahat’ implies that we will represent the futility boundary in terms of conditional power evaluated at the value of δ̂ at which the test statistic Z = δ̂/se(δ̂) would just hit the futility boundary. The screenshot below represents the first two values of the futility boundary on the cp delta1 Scale. For example, the stopping boundary at the first look is cp delta1=0.1137. This is to be interpreted in the following way: if at the first look the value of the test statistic Z just falls on the futility boundary, then the conditional power, as defined by Section C.3 of Appendix C with δ = δ1 = 0, will be 0.1137. This gives us a way to express the futility boundary in terms of conditional power. The cp delta1 Scale might not give one an accurate picture of futility. This is because, on this scale, the conditional power is evaluated at the value of δ = δ1 484 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 specified at the design stage. However, if the test statistic has actually fallen on the futility boundary, the data are more suggestive of the null than the alternative hypothesis and it is not very likely that δ = δ1 . Thus it might be more reasonable to evaluate conditional power at the observed value δ = δ̂. The screenshot below represents the futility boundary on the cp deltahat Scale. For example, the stopping boundary at the second look is cp deltahat=0.0044. This is to be interpreted in the following way: if at the second look, the value of test statistic Z just falls on the futility boundary, then the conditional power, as defined by Section C.3 of Appendix C with δ = δ̂ = Z × se(δ̂), will be 0.0044. It is important to realize that the futility boundary has not changed. It is merely being expressed on a different scale. On the whole, it is probably more realistic to express the futility boundary on the cp deltahat scale than on the cp delta1 scale since it is highly unlikely that the true value of δ is equal to δ1 if Z has hit the futility boundary. Close this chart before continuing. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the three 24.1 Difference of Proportions – 24.1.1 Trial Design 485 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample designs side-by-side: Observe that Design3 will stop with a smaller expected sample size under either H0 or H1 compared to Design2. Three-Look Design Powered at δ 6= 0 The previous designs were all powered to detect the alternative hypothesis that the new treatment and the active control have the same response rate (δ1 = 0). As is usually the case with non-inferiority trials, the distance between the non-inferiority margin δ0 = 0.05 and the alternative hypothesis δ1 = 0 is rather small, thereby resulting in a very large sample size commitment to this trial. Sometimes a new treatment is actually believed to have a superior response rate to the active control. However the anticipated treatment benefit might be too small to make it feasible to run a superiority trial. Suppose, for example, that it is anticipated that the treatment arm could improve upon the 80% response rate of the active control by about 2.5%. A single-look superiority trial designed for 90% power to detect this small of a difference would require over 12000 subjects. In this situation, the sponsor might prefer to settle for a non-inferiority claim. A non-inferiority trial in which the active control has a response probability of πc = 0.8, the non-inferiority margin is δ0 = −0.05, and the alternative hypothesis is δ1 = πc − πt = −0.025 can be designed as follows. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, choose the design parameters as 486 24.1 Difference of Proportions – 24.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 shown below. Click the Compute button to generate output for Design4. Notice that this design requires only 1161 subjects. This is 1585 fewer subjects than under Design3. 24.1.2 Trial Simulation You can simulate Design 3 by selecting Design3 in the Library, and clicking the icon from Library toolbar. Alternatively, right-click on Design3 and select Simulate. 24.1 Difference of Proportions – 24.1.2 Trial Simulation 487 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample A new Simulation worksheet will appear. Try different choices for the simulation parameters to verify the operating characteristics of the study. For instance under the Response Generation Info tab, set Prop. Under Control to 0.8 and Prop. Under Treatment to 0.75. You will be simulating under the null hypothesis and should achieve a rejection rate of 2.5%. Now, click on the Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper pane, labeled Compare Designs. Click the 488 icon to save it to the Library. Double-click on Simulation1 24.1 Difference of Proportions – 24.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 in the Library. The simulation output details will be displayed. We see above that we achieved a rejection rate of 2.5%. Now suppose that the new treatment is actually slightly superior to the control treatment. For example, πc = 0.8 and πt = 0.81. Since this study is designed for 90% power when πc = πt = 0.8, we would expect the simulations to reveal power in excess of 90%. Select Sim1 node in the Library, and click the icon from Library toolbar. Under the Response Generation Info tab change the Prop. Under Treatment to 0.81. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation2. Select icon to save it to the Library. Simulation2 in the Output Preview. Click the Double-click on Simulation2 in the Library. The simulation output details will be 24.1 Difference of Proportions – 24.1.2 Trial Simulation 489 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample displayed. These results show that the power exceeds 97%. The power of the study will deteriorate if the response rate of the control arm is less than 0.8, even if πc = πt . To see this, let us simulate with πc = πt = 0.7. The results 490 24.1 Difference of Proportions – 24.1.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 are shown below. Notice that the power has dropped from 90% to 80% even though the new treatment and the control treatment have the same response rates. This is because the lower response rates for πc and πt induce greater variability into the distribution of the test statistic. In order to preserve power, the sample size must be increased. This can be achieved without compromising the type-1 error within the group sequential framework by designing the study for a maximum amount of (Fisher) information instead of a maximum sample size. We discuss maximum information studies later, in Chaper 59. 24.1.3 Interim Monitoring Consider interim monitoring of Design3. Select Design3 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Design3 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 500 subjects on each treatment arm, with 395 responses on the treatment arm and 400 responses on the control arm. 24.1 Difference of Proportions – 24.1.3 Interim Monitoring 491 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 1000. Enter −0.01 in the box next to Estimate of δ. In the box next to Std. Errof of δ enter 0.02553. Next click Recalc. Note that the test statistic is computed to be 1.567. Upon clicking the OK button, East will produce the interim monitoring report shown below. The stopping boundary for declaring non-inferiority is 3.535 whereas the value of the test statistic is only 1.567. Thus the trial should continue. 492 24.1 Difference of Proportions – 24.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Suppose that the next interim look occurs after accruing 1250 patients on each arm with 1000 responses on the control arm and 990 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. The estimate of δ is -0.008 and the standard error is 0.016118. Enter the appropriate values as shown below and click Recalc. Note that the value of the test statistic is now 2.606. Now click the OK button. This time the stopping boundary for declaring non-inferiority is crossed. The following 24.1 Difference of Proportions – 24.1.3 Interim Monitoring 493 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample message box appears. Click the Stop button to stop the study. The analysis results are shown below. The lower bound on the 87.5% repeated confidence interval is -0.042, comfortably within the non-inferiority margin of -0.05 specified at the design stage. East also provides a p-value, confidence interval and median unbiased point estimate for πt − πc using stage-wise ordering of the sample space as described in Jennison and Turnbull (2000, page 179). This is located in the Adjusted Inference Table, located in the lower section of the IM Worksheet. In the present example, the lower confidence 494 24.1 Difference of Proportions – 24.1.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 bound is -0.040, slightly greater than the corresponding bound from the repeated confidence interval. 24.2 Ratio of Proportions: Wald Formulation 24.2.1 Trial Design 24.2.2 Trial Simulation 24.2.3 Interim Monitoring Let πc and πt denote the response rates for the control and the experimental treatments, respectively. Let the difference between the two arms be captured by the ratio πt ρ= . πc The null hypothesis is specified as H0 : ρ = ρ0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is H1 : ρ > ρ0 or equivalently as H1 : πt > ρ0 πc . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ρ0 > 1 and the alternative hypothesis is H1 : ρ < ρ0 or equivalently as H1 : πt < ρ0 πc . For any given πc , the sample size is determined by the desired power at a specified value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ρ1 which is consistent with the choice of H1 . Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including j-th look, j = 1, . . . , K, where a maximum of K looks are to be made. It is convenient to express the treatment effect on the logarithm scale as δ = ln ρ = ln πt − ln πc . (24.5) The test statistic at the jth look is then defined as Zj = δ̂j − δ0 se(δ̂j ) 24.2 Ratio of Proportions: Wald Formulation (24.6) 495 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample where δ̂j = ln π̂tj π̂cj , δ0 = ln(ρ0 ) and s se(δ̂j ) = 24.2.1 1 − π̂cj 1 − π̂tj + . ncj π̂cj ntj π̂tj (24.7) (24.8) (24.9) Trial Design The Coronary Artery Revascularization in Diabetes (CARDia) trial (Kapur et. al., 2005) was designed to compare coronary bypass graft surgery (CABG) and percutaneous coronary intervention (PCI) as strategies for revascularization, with the goal of showing that PCI is noninferior to CABG. We use various aspects of that study to exemplify the methodology to test for inferiority. The endpoint is the one-year event rate, where an event is defined as the occurrence of death, nonfatal myocardial infarction, or cerebrovascular accident. Suppose that the event rate for the CABG is πc = 0.125 and that the claim of non-inferiority for PCI can be sustained if one can demonstrate statistically that the ratio ρ = πt /πc is at most 1.3. In other words, PCI is considered to be non-inferior to CABG as long as πt < 0.1625. Thus the null hypothesis H0 : ρ = 1.3 is tested against the one-sided alternative hypothesis H1 : ρ < 1.3. We want to determine the sample size required to have power of 80% when ρ = 1 using a one-sided test with a type-1 error rate of 0.05. Single Look Design Powered at ρ = 1 First we consider a study with only one look and equal sample sizes in the two groups. To begin click Two Proportions on the Design tab under Discrete, and then click Ratio of Proportions. 496 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the ensuing dialog box, next to Trial, select Noninferiority from the drop down menu. Choose the remaining design parameters as shown below. Make sure to select the radio button for Wald in the Test Statistic box. We will discuss the Score (Farrington Manning) test statistic in the next section. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. This single-look design requires a combined total of 2515 subjects from both treatments in order to attain 80% power. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design 497 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample tooltip will appear that summarizes the input parameters of the design. Three-Look Design Powered at ρ = 1 For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming 498 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (1979). Technical details of these stopping boundaries are available in Appendix F. Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In the Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the side-by-side: icon. The upper pane will display the details of the two designs Using three planned looks requires an up-front commitment of 2566 subjects, a slight 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design 499 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample inflation over the single-look design which required 2515 subjects. However, the three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 2134 subjects under the alternative hypothesis (πc = 0.125, ρ = 1), and still ensures that the power is 80%. icon, East By selecting Design2 in the Library and clicking on the click the displays the cumulative accrual, the stopping boundary, the type-1 error spent and the boundary crossing probabilities under the null hypothesis H0 : ρ = 1.3, and the alternative hypothesis H1 : ρ = 1 . Single-Look Design Powered at ρ 6= 1 Sample sizes for non-inferiority trials powered at ρ = 1 are generally rather large, because regulatory requirements usually impose small non-inferiority margins (see, for example, Wang et. al., 2001). Observe that both Design1 and Design2 were powered at ρ = 1 and required sample sizes in excess of 2500 subjects. However, based on Kapur et al (2005), it is reasonable to expect πt < πc . We now consider the same design as in Design1, but we will power at the alternative hypothesis ρ1 = 0.72. That is, we will design this study to have 80% power to claim non-inferiority if πc = 0.125 and πt = 0.72 × 0.125 = 0.09. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the design parameters as 500 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 shown below. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the three designs side-by-side: This single-look design requires a combined total of 607 subjects from both treatments in order to attain 80% power. This is a considerable decrease from the 2515 subjects required to attain 80% power using Design1with ρ1 = 1. Three-Look Design Powered at ρ 6= 1 We now consider the impact of multiple looks on Design3. Suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan-DeMets (O’Brien-Fleming) stopping 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design 501 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample boundary. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3. Click the Compute button to generate output for Design4. Using three planned looks inflates the maximum sample size slightly, from 607 to 619 subjects. However it results in a smaller expected sample size under H1 . Observe that the expected sample size is only 515 subjects under the alternative hypothesis (πc = 0.125, ρ = 0.72), and still ensures the power is 80%. 502 24.2 Ratio of Proportions: Wald Formulation – 24.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 24.2.2 Trial Simulation You can simulate Design4 by selecting it from the Library and clicking on the icon. Try different choices for the simulation parameters to verify the operating characteristics of the study. For instance, under the Response Generation Info tab set Prop. Under Control to 0.125 and Prop. Under Treatment to 0.72 × 0.125 = 0.09. Click Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper icon to save it to the Library. pane, labeled Compare Designs. Click the Double-click on Simulation1 in the Library. The simulation output details will be displayed. 24.2 Ratio of Proportions: Wald Formulation – 24.2.2 Trial Simulation 503 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample We simulated the data under the alternative hypothesis and should achieve a rejection rate of 80%. This is confirmed above (up to Monte Carlo accuracy). Next, to simulate under the null hypothesis, under the Response Generation Info tab set Prop. Under Treatment to 1.3 × 0.125 = 0.1625. Click Simulate button. This time the rejection rate is only 5% (up to Monte Carlo accuracy), as we would expect under the null hypothesis. You may experiment in this manner with different values of πc and πt and observe the rejection rates look by look as well as averaged over all looks. 24.2.3 Interim Monitoring icon from the Library toolbar. Select Design4 in the Library, and click the Alternatively, right-click on Design4 and select Create IM Dashboard. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 125 subjects on each treatment 504 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 arm, with 15 responses on the control arm and 13 responses on the treatment arm. Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 250. Enter −0.143101 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.357197. Next click Recalc. Notice that the test statistic is computed to be -1.135. This value for the test statistic was obtained by substituting the observed sample sizes and responses into equations (24.6) through (24.9). Upon clicking the OK button, East will produce the interim monitoring report shown below. 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring 505 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Note - Click on icon to hide or unhide the columns of your interest. The stopping boundary for declaring non-inferiority is -2.872 whereas the value of the test statistic is only -1.135. Thus the trial should continue. This conclusion is supported by the value of the 97.5% upper confidence bound of the repeated confidence interval for δ = ln(ρ). The non-inferiority claim could be sustained only if this bound were less than ln(1.3) = 0.262. At the current interim look, however, the upper bound on δ is 0.883, indicating that the non-inferiority claim is not supported by the data. Suppose that the next interim look occurs after accruing 250 patients on each arm with 31 responses on the control arm and 22 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. In the box next to Cumulative Sample Size enter 500. Enter −0.342945 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.264031. Next click 506 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Recalc. Notice that the test statistic is computed to be -2.293. Click the OK button. This time the stopping boundary for declaring non-inferiority is crossed. The following message box appears. 24.2 Ratio of Proportions: Wald Formulation – 24.2.3 Interim Monitoring 507 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample Click the Stop button to stop the study. The analysis results are shown below. The upper bound on the 95.0% repeated confidence interval for δ is 0.159. Thus the upper confidence bound on ρ is exp(0.159) = 1.172, comfortably within the non-inferiority margin ρ0 = 1.3 specified at the design stage. In the Final Inference Table in the bottom portion of the IM worksheet, East also provides a p-value, confidence interval and median unbiased point estimate for δ using stage-wise ordering of the sample space as described in Jennison and Turnbull (2000). This approach often yields narrower confidence intervals than the repeated confidence intervals approach although both approaches have the desired 95.0% coverage. In the present example, the upper confidence bound is 0.098, slightly less than the corresponding bound from the repeated confidence interval. 24.3 Ratio of Proportions: Farrington-Manning Formulation 24.3.1 Trial Design 24.3.2 Trial Simulation 24.3.3 Interim Monitoring 508 An alternative approach to establishing non-inferiority of an experimental treatment to the control treatment with respect to the ratio of probabilities was proposed by Farrington and Manning (1990). Let πc and πt denote the response rates for the control and the experimental treatments, respectively. Let the difference between the two arms be expressed by the ratio πt ρ= πc 24.3 Ratio of Proportions: Farrington-Manning <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The null hypothesis is specified as H0 : ρ = ρ0 , or equivalently H0 : π t = ρ 0 π c , which is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is H1 : ρ > ρ0 or equivalently as H1 : πt > ρ0 πc . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ρ0 > 1 and the alternative hypothesis is H1 : ρ < ρ0 or equivalently as H1 : πt < ρ0 πc . For any given πc , the sample size is determined by the desired power at a specified value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ρ1 which is consistent with the choice of H1 . Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including the j-th look, j = 1, . . . , K, where a maximum of K looks are to be made. The test statistic at the j-th look is defined as Z j = rh π̂tj − ρ0 π̂cj π̂tj (1−π̂tj ) ntj + ρ20 π̂cj (1−π̂cj ) ncj i. (24.10) The choice of test statistic is the primary distinguishing feature between the above Farrington-Manning formulation and the Wald formulation of the non-inferiority test discussed in Section 24.2. The Wald statistic (24.6) measures the standardized difference between the observed ratio of proportions and the non-inferiority margin on the natural logarithm scale. The corresponding repeated one-sided confidence bounds displayed in the interim monitoring worksheet estimate ln(πt /πc ) and may be converted to estimates of the ratio of proportions by exponentiation. On the other hand, 24.3 Ratio of Proportions: Farrington-Manning 509 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample the Farrington-Manning formulation focuses on the expression of the null hypothesis as H0 : πt − ρ0 πc = 0. Thus, we consider δ = πt − ρ0 πc (24.11) as the parameter of interest. The test statistic (24.10) is the standardized estimate of this difference obtained at the j-th look. A large difference in the direction of the alternative hypothesis is indicative of non-inferiority. The corresponding repeated one-sided confidence bounds displayed in the interim monitoring worksheet provide estimates of δ rather than directly estimating ρ or ln(ρ). The Farrington-Manning and Wald procedures are equally applicable for hypothesis testing since the null hypothesis δ = 0 is rejected if and only if the corresponding null hypothesis ρ = ρ0 is rejected. 24.3.1 Trial Design We consider the Coronary Artery Revascularization in Diabetes (CARDia) trial (Kapur et al, 2005) compared coronary bypass graft surgery (CABG) and percutaneous coronary intervention (PCI) as strategies for revascularization, with the goal of showing that PCI is noninferior to CABG, presented in Section 24.2. We use various aspects of that study to exemplify the use of the methodology to test for inferiority with respect to the one-year event rate where an ”event” is the occurrence of death, nonfatal myocardial infarction, or cerebrovascular accident, using the Farrington-Manning formulation. Suppose that the event rate for the CABG is πc = 0.125 and that the claim of non-inferiority for PCI can be sustained if the ratio ρ is at most 1.3; that is, the event rate for the PCI (πt ) is at most 0.1625. The null hypothesis H0 : ρ = 1.3 is tested against the alternative hypothesis H1 : ρ < 1.3. We want to determine the sample size required to have power of 80% when ρ = 1 using a one-sided test with a type-1 error rate of 0.05. Single Look Design Powered at ρ = 1 First we consider a study with only one look and equal sample sizes in the two groups. To begin click Two Proportions on the 510 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Design tab, and then click Ratio of Proportions. In the ensuing dialog box, next to Trial, select Noninferiority from the drop down menu. Choose the remaining design parameters as shown below. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. This single-look design requires a combined total of 2588 subjects from both treatments in order to attain 80% power. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design 511 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample tooltip will appear that summarizes the input parameters of the design. Three-Look Design Powered at ρ = 1 For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of O’Brien and Fleming 512 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (1979). Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In the Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design 513 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample side-by-side: Using three planned looks requires an up-front commitment of 2640 subjects, a slight inflation over the single-look design which required only 2588 subjects. However, the three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 2195 subjects under the alternative hypothesis (πc = 0.125, ρ = 1), and still ensures that the power is 80%. By selecting Design2 in the Library and clicking on the click the icon, East displays the cumulative accrual, the stopping boundary, the type-1 error spent and the boundary crossing probabilities under the null hypothesis H0 : ρ = 1.3, and the alternative hypothesis H1 : ρ = 1 . 514 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Single-Look Design Powered at ρ 6= 1 Sample sizes for non-inferiority trials powered at ρ = 1 are generally rather large because regulatory requirements usually impose small non-inferiority margins. Observe that both Design1 and Design2 were powered at ρ = 1 and required sample sizes in excess of 2500 subjects. However, based on Kapur et al (2005), it is reasonable to expect πt < πc . We now consider the same design as in Design1, but we will power at the alternative hypothesis ρ1 = 0.72. That is, we will design this study to have 80% power to claim non-inferiority if πc = 0.125 and πt = 0.72 × 0.125 = 0.09. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the design parameters as shown below. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design 515 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample icon. The upper pane will display the details of the three designs side-by-side: This single-look design requires a combined total of 628 subjects from both treatments in order to attain 80% power. This is a considerable decrease from the 2588 subjects required to attain 80% power using Design1, i.e. with ρ1 = 1. Three-Look Design Powered at ρ 6= 1 We now consider the impact of multiple looks on Design3. Suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3. 516 24.3 Ratio of Proportions: Farrington-Manning – 24.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click the Compute button to generate output for Design4. Using three planned looks inflates the maximum sample size slightly, from 628 to 641 subjects. However it results in a smaller expected sample size under H1 . Observe that the expected sample size is only 533 subjects under the alternative hypothesis (πc = 0.125, ρ = 0.72), and still ensures the power is 80%. 24.3.2 Trial Simulation You can simulate Design4 by selecting Design4 in the Library and clicking on the icon. Try different choices for the simulation parameters to verify the operating characteristics of the study. For instance, under the Response Generation Info tab set Prop. Under Control to 0.125 and Prop. Under Treatment to 0.72 × 0.125 = 0.09. Click Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper pane, labeled Compare Designs. Click the icon to save it to the Library. Double-click on Simulation1 in the Library. The simulation output details will be 24.3 Ratio of Proportions: Farrington-Manning – 24.3.2 Trial Simulation 517 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample displayed. We simulated the data under the alternative hypothesis and should achieve a rejection rate of 80%. This is confirmed above (up to Monte Carlo accuracy). Next, to simulate under the null hypothesis. Edit the Sim1 node by clicking icon and under the Response Generation Info tab, set Prop. Under Treatment to 518 24.3 Ratio of Proportions: Farrington-Manning – 24.3.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1.3 × 0.125 = 0.1625. Click Simulate button. This time the rejection rate is only 5% (up to Monte Carlo accuracy), as we would expect under the null hypothesis. You may experiment in this manner with different values of πc and πt and observe the rejection rates look by look as well as averaged over all looks. 24.3.3 Interim Monitoring icon from the Library toolbar. Select Design4 in the Library, and click the Alternatively, right-click on Design4 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 125 subjects on each treatment arm, with 15 responses on the control arm and 13 responses on the treatment arm. Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 250. Enter −0.052 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.046617. Next click Recalc. 24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring 519 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample The test statistic is computed to be -1.115. This value for the test statistic was obtained by substituting the observed sample sizes and responses into equation (24.10). Upon clicking the OK button, East will produce the interim monitoring report shown below. The stopping boundary for declaring non-inferiority is -2.929 whereas the value of the test statistic is only -1.115. Thus the trial should continue. This conclusion is also 520 24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 supported by the upper confidence bound on δ = πt − ρ0 πc which at present equals 0.085. A necessary and sufficient condition for the stopping boundary to be crossed, and non-inferiority demonstrated thereby, is for this upper confidence bound to be less than zero. Suppose that the next interim look occurs after accruing 250 patients on each arm with 31 responses on the control arm and 22 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. In the box next to Cumulative Sample Size enter 500. Enter −0.0732 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.032486. Next click Recalc. Notice that the test statistic is computed to be -2.253. Click the OK button. This time the stopping boundary for declaring non-inferiority is 24.3 Ratio of Proportions: Farrington-Manning – 24.3.3 Interim Monitoring 521 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample crossed. The following message box appears. Click the Stop button to stop the study. The analysis results are shown below. Notice that the upper confidence bound of the repeated confidence interval for δ excludes zero. In the Final Inference Table in the bottom portion of the IM worksheet, East also provides a p-value, confidence interval and median unbiased point estimate for δ using stage-wise ordering of the sample space as described in Jennison and Turnbull (2000, page 179). The upper confidence bound for δ based on the stage-wise method likewise excludes zero. 24.4 522 Odds Ratio Test Let πt and πc denote the two binomial probabilities associated with the treatment (t) 24.4 Odds Ratio Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and the control (c). Let the difference between the two treatment arms be captured by the odds ratio πt /(1 − πt ) πt (1 − πc ) ψ= = . πc /(1 − πc ) πc (1 − πt ) The null hypothesis is specified as H0 : ψ = ψ 0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ψ0 > 1 and the alternative hypothesis is H1 : ψ > ψ 0 . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ψ0 < 1 and the alternative hypothesis is H1 : ψ < ψ 0 . For any given πc , the sample size is determined by the desired power at a specified value ψ = ψ1 . A common choice is ψ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ψ1 which is consistent with the choice of H1 . Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including j-th look, j = 1, . . . , K, where a maximum of K looks are to be made. It is convenient to express the treatment effect on the logarithmic scale as δ = ln ψ . (24.12) The test statistic at the jth look is then defined as Zj = 24.4.1 δ̂j − δ0 se(δ̂j ) =q ln(ψ̂j ) − ln(ψ0 ) 1 ntj π̂tj (1−π̂tj ) + 1 ncj π̂cj (1−π̂cj ) . (24.13) Trial Design Suppose that the response rate for the control treatment is 90%, where higher response rates imply patient benefit. Assume that a claim of non-inferiority can be sustained if we can demonstrate statistically that the experimental treatment has a response rate of at least 80%. In other words the non-inferiority margin is ψ0 = 0.8(1 − 0.9) = 0.444 . 0.9(1 − 0.8) 24.4 Odds Ratio Test – 24.4.1 Trial Design 523 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample The null hypothesis H0 : ψ = 0.444 is to be tested against the one-sided alternative H1 : ψ > 0.444. Suppose that we want to determine the sample size required to have power of 90% when πc = 0.9 and ψ1 = 1, i.e. πc = πt , using a test with a type-1 error rate of 0.05. Single-Look Design Powered at ψ = 1 First we consider a study with only one look and equal sample sizes in the two groups. To begin click Two Proportions on the Design tab, and then click Odds Ratio of Proportions. In the ensuing dialog box, next to Trial, select Noninferiority from the drop down menu. Choose the remaining design parameters as shown below. Now click Compute. The design is shown as a row in the Output Preview located in the lower pane of this window. This single-look design requires a combined total of 524 24.4 Odds Ratio Test – 24.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 579 subjects from both treatments in order to attain 90% power. You can select this design by clicking anywhere along the row in the Output Preview. Some of the design details will be displayed in the upper pane, labeled Compare Designs. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Design1 in the Library, a tooltip will appear that summarizes the input parameters of the design. Three-Look Design Powered at ψ = 1 For the above study, suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the default Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab Boundary Info will appear. Clicking on this tab will reveal the stopping boundary parameters. By default, the Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. By default, there is an efficacy boundary (to reject H0) selected, but no futility boundary (to reject H1). The Boundary Family specified is of the Spending Functions type. The default Spending function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming), which generates boundaries that are very similar, though not identical, to the classical stopping boundaries of 24.4 Odds Ratio Test – 24.4.1 Trial Design 525 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample O’Brien and Fleming (1979). Technical details of these stopping boundaries are available in Appendix F. Click the Compute button to generate output for Design2. With Design2 selected in the Output Preview, click the icon to save Design2 to the Library. In the Library, select the rows for Design1 and Design2, by holding the Ctrl key, and then click the side-by-side: icon. The upper pane will display the details of the two designs Using three planned looks requires an up-front commitment of 590 subjects, a slight inflation over the single-look design which required 579 subjects. However, the 526 24.4 Odds Ratio Test – 24.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 457 subjects under the alternative hypothesis (πc = 0.9, ψ = 1), and still ensures that the power is 90%. Single-Look Design Powered at ψ 6= 1 Suppose that it is expected that the new treatment is a bit better than the control, but it is unnecessary and unrealistic to perform a superiority test. The required sample size for ψ1 = 1.333, i.e. πt = 0.92308, is determined. Create a new design by selecting Design1 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the design parameters as shown below. Click the Compute button to generate output for Design3. With Design3 selected in the Output Preview, click the icon. In the Library, select the rows for Design1, Design2, and Design3, by holding the Ctrl key, and then click the 24.4 Odds Ratio Test – 24.4.1 Trial Design 527 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample icon. The upper pane will display the details of the three designs side-by-side: We observe that a single-look design powered at ψ1 = 1.333 reduces the sample size considerably relative to the single-look design powered at ψ1 = 1. The reduction in maximum sample size for the three-look design is approximately 38% (=(579-358)/579). However, Design3 should be implemented after careful consideration, since its favorable operating characteristics are only applicable to the optimistic situation where ψ1 = 1.333. If ψ1 < 1.33, the power under Design3 decreases and may be too small to establish noninferiority, even if the true value > 1, but is < 1.333. Three-Look Design Powered at ψ 6= 1 For the above study (Design3), suppose we wish to take up to two equally spaced interim looks and one final look at the accruing data, using the default Lan- DeMets (O’Brien-Fleming) stopping boundary. Create a new design by selecting Design3 in the Library, and clicking the icon on the Library toolbar. In the ensuing dialog box, change the Number of Looks to 3. Click the Compute button to generate output for Design4. 528 24.4 Odds Ratio Test – 24.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Using three planned looks requires an up-front commitment of 365 subjects, a small inflation over the single-look design which required 358 subjects. However, the three-look design may result in a smaller sample size than that required for the single-look design, with an expected sample size of 283 subjects under the alternative hypothesis (πc = 0.9, ψ = 1.333), and still ensures that the power is 90%. 24.4.2 Trial Simulation You can simulate Design4 by selecting Design4 in the Library and clicking on the icon. Try different choices for the simulation parameters to verify the operating characteristics of the study. First, we verify the results under the alternative hypothesis at which the power is to be controlled, namely πc = 0.9 and πt = 0.92308. Under the Response Generation Info tab set Prop. Under Control to 0.9 and Prop. Under Treatment to 0.92308. Click Simulate button. Once the simulation run has completed, East will add an additional row to the Output Preview labeled Simulation1. Select Simulation1 in the Output Preview. Note that some of the design details will be displayed in the upper pane, labeled Compare Designs. Click the icon to save it to the Library. Double-click on Simulation1 in the Library. The simulation output details will be 24.4 Odds Ratio Test – 24.4.2 Trial Simulation 529 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample displayed. We see here that the power is approximately 90%. Now let’s consider the impact if the sample size was determined assuming πc = 0.9, ψ1 = 1.333 when the true values are πc = 0.9 and ψ1 = 1. Under the Response Generation Info tab set Prop. Under Treatment to 0.9. Click Simulate 530 24.4 Odds Ratio Test – 24.4.2 Trial Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 button. This results in a power of approximately 74%. From this we see that if that optimistic choice is incorrect, then the power to establish nonninferiority has decreased to a possibly unacceptable value of 74%. 24.4.3 Interim Monitoring Select Design4 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Design4 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the trial is first monitored after accruing 60 subjects on each treatment arm, with 50 responses on the control arm and 52 responses on the treatment arm. Click on the icon to invoke the Test Statistic Calculator. In the box next to Cumulative Sample Size enter 120. Enter 0.264231 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.514034. Next click Recalc. Notice that the test statistic is computed to be 2.092. This value for the test statistic was 24.4 Odds Ratio Test – 24.4.3 Interim Monitoring 531 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample obtained by substituting the observed sample sizes and responses into equation (24.13). Upon clicking the OK button, East will produce the interim monitoring report shown below. Note - Click on icon to hide or unhide the columns of your interest. The critical value is 3.22, and since the observed value of the test statistic (24.13) is less than this value, the null hypothesis cannot be rejected. Therefore, noninferiority cannot as yet be concluded. Suppose that the second look is made after accruing 120 subjects on each treatment 532 24.4 Odds Ratio Test – 24.4.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 arm, with 112 responses on the control arm and 115 responses on the treatment arm. Click on the second row in the table in the upper section. Then click the icon. In the box next to Cumulative Sample Size enter 240. Enter 1.43848 in the box next to Estimate of δ. In the box next to Std. Error of δ enter 0.801501. Next click Recalc. Notice that the test statistic is computed to be 2.808. This value for the test statistic was obtained by substituting the observed sample sizes and responses into equation (24.13). Click the OK button. This time the stopping boundary for declaring non-inferiority is 24.4 Odds Ratio Test – 24.4.3 Interim Monitoring 533 <<< Contents 24 * Index >>> Binomial Non-Inferiority Two-Sample crossed. The following message box appears. Click the Stop button to stop the study. The analysis results are shown below. The null hypothesis is rejected and we conclude that the treatment is noninferior to the control. In the Final Inference Table in the bottom portion of the IM worksheet, East also provides a stage-wise adjusted p-value, median unbiased point estimate and confidence interval for ψ as described in Jennison and Turnbull (2000) and in Appendix C of the East user manual. In the present example the adjusted p-value is 0.003, the point estimate for ψ is exp(1.427) = 4.166 and the upper 95% confidence bound for ψ is exp(0.098) = 1.103. 534 24.4 Odds Ratio Test <<< Contents * Index >>> 25 25.1 Equivalence Test Binomial Equivalence Two-Sample In some experimental situations, it is desired to show that the response rates for the control and the experimental treatments are ”close”, where ”close” is defined prior to the collection of any data. Examples of this include showing that an aggressive therapy yields a similar rate of a specified adverse event to the established control, such as the bleeding rates associated with thrombolytic therapy or cardiac outcomes with a new stent. Let πc and πt denote the response rates for the control and the experimental treatments, respectively, and let π̂t and π̂c denote the estimates of πt and πc based on nt and nc observations from the experimental and control treatments. Furthermore, let δ = πt − πc , (25.1) δ̂ = π̂t − π̂c . (25.2) which is estimated by Finally, let the variance of δ̂ be σ2 = πc (1 − πc ) πt (1 − πt ) + , nc nt (25.3) σ̂ 2 = π̂c (1 − π̂c ) π̂t (1 − π̂t ) + . nc nt (25.4) which is estimated by The null hypothesis H0 : |πt − πc | = δ0 is tested against the two-sided alternative hypothesis H1 : |πt − πc | < δ0 , where δ0 (> 0) is specified to define equivalence. Following Machin and Campbell (1987), we present the solution to this problem as a one-sided α -level test. The decision rule is to declare equivalence if −δ0 + zα σ̂ ≤ π̂t − π̂c ≤ δ0 − zα σ̂. (25.5) We see that decision rule (25.5) is the same as declaring equivalence if the (1 − 2α) 100% confidence interval for πt − πc is entirely contained with the interval (−δ0 , δ0 ). The power or sample size are determined for a single-look study only. The extension to multiple looks is given in the next section. The sample size, or power, is determined at a specified difference πt − πc , denoted δ1 , where −δ0 < δ1 < δ0 . The probability of declaring equivalence depends on the true values of πc and πt . Based on the results of Machin and Campbell (1987), the required total sample size (N) is, for nt = rN and nc = (1 − r)N , (zα + zβ )2 πc (1 − πc ) (πc + δ1 )(1 − (πc + δ1 )) + . (25.6) N= (δ0 − δ1 )2 1−r r 25.1 Equivalence Test – 25.1.1 Trial Design 535 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample 25.1.1 Trial Design Consider the development of a new stent which is to be compared to the standard stent with respect to target vessel failure (acute failure, target vessel revascularization, myocardial infarction, or death) after one year. The standard stent has an assumed target vessel failure rate of 20%. Equivalence is defined as δ0 = 0.075. The sample size is to be determined with α = 0.025 (one-sided) and power, i.e. probability of declaring equivalence, of 1 − β = 0.80. To begin click Two Samples on the Design tab, and then click Difference of Proportions. Suppose that we want to determine the sample size required to have power of 80% when δ1 = 0. Enter the relevant parameters into the dialog box as shown below. In the drop down box next to Trial Type be sure to select Equivalence. 536 25.1 Equivalence Test – 25.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on the Compute button. The design is shown as a row in the Output Preview located in the lower pane of this window. The sample size required in order to achieve the desired 80% power is 1203 subjects. You can select this design by clicking anywhere along the row in the Output Preview. If you double click anywhere along the row in the Output Preview some of the design details will be displayed in the upper pane, labeled Output Summary. In the Output Preview toolbar, click the icon to save this design to Workbook1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. If the assumed difference δ1 is not zero, it is more difficult to establish equivalence, in the sense that the power is lower and thus the required sample size is larger. Consider δ1 = 0.025, so that the new stent increases the rate to 22.5%. Create a new design 25.1 Equivalence Test – 25.1.1 Trial Design 537 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample Des2 by selecting Des1 in the Library, and clicking the icon on the Library toolbar. Change the value of Expected Diff. from 0 to 0.025 as shown below. Click on the Compute button. The design is shown as a row in the Output Preview located in the lower pane of this window. With Design2 selected in the Output Preview, click the icon. In the Library, select the rows for Des1 and Des2, by holding the Ctrl key, and then click the details of the two designs side-by-side: icon. The upper pane will display the This single-look design requires a combined total of 2120 subjects from both 538 25.1 Equivalence Test – 25.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 treatments in order to attain 80% power. Consider δ1 = −0.025, so that the new stent decreases the rate to 17.5%. Create a new design, as above, and change the value of Expected Diff. to −0.025. Click the Compute button to generate the output for Des3. With Des3 selected in the Output Preview, click the icon. In the Library, select the nodes for Des1, Des2, and Des3 by holding the Ctrl key, and then click the display the details of the three designs side-by-side: icon. The upper pane will Des3 yields a required total sample size of 1940 subjects. This asymmetry is due to the fact that the variance is smaller for values of πc + δ1 further from 0.5. 25.1.2 Extension to Multiple Looks Although the details presented in the previous section are related to a single-look design only, these results can be used to extend the solution to allow for multiple equally-spaced looks. We can use the General Design Module to generalize the solution to this problem to the study design with multiple looks. Details are given in Chapters 60 and 59. Let π̂tj and π̂cj denote the estimates of πt and πc based on ntj and ncj observations from the experimental and control treatments, respectively, up to and including the j-th look, j = 1, . . . , K, where a maximum of K looks are to be used. Let nj = ncj + ntj and δ̂j = π̂tj − π̂cj (25.7) 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 539 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample denote the estimate of δ, given by (25.1), and let σ̂j2 = π̂cj (1 − π̂cj ) π̂tj (1 − π̂tj ) + ncj ntj (25.8) denote the estimate of σ 2 , given by (25.3), using the data available at the j-th look. At the j-th look, the inference is based on Zj = δ̂j . σ̂j (25.9) Let η=δ p Imax , where Imax is described in Chapter 59. Let tj = nj /nmax , j = 1, . . . , K. Then, using the multivariate normal approximation to the distribution of Z1 , . . . , ZK , with the 1/2 expected value of Zj equal to tj η and the variance of Zj equal to 1, the (1 − α)100% repeated confidence intervals for η are ! Zj + CLj Zj + CU j , , (25.10) 1/2 1/2 tj tj where CLj and CU j are the values specified by the stopping boundary. The corresponding (1 − α)100% repeated confidence intervals for δ are (δj + CLj , δj + CU j ). (25.11) Using the General Design Module, East provides these repeated confidence intervals for η. By considering the decision rule (25.5) as declaring equivalence if the (1 − 2α) 100% confidence interval for πt − πc is entirely contained with the interval (−δ0 , δ0 ), we generalize the decision rule to a multiple-look design by concluding equivalence and stopping the study the first time one of the repeated (1 − 2α) 100% confidence intervals for η is entirely contained within the interval (−η0j , η0j ), where 1/2 η0j = δ0 /tj σ̂j . Consider Design1 (i.e. πc = 0.20, δ0 = 0.075, and δ1 = 0). As we saw above, a total of 1203 subjects are required for decision rule (25.5) to have power of 80% of declaring equivalence, using a 95% confidence interval. 540 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To begin click on the Other Designs on the Design tab and then click Sample Size-Based. Enter the parameters as shown below. For the Sample Size for Fixed-Sample Study enter 1203, the value obtained from Des1. Also, be sure to set the Number of Looks to 5. Recall that the choice here is twice the (one-sided) value specified for the single-look design. The General Design Module is designed for testing the null hypothesis H00 : η = 0. Thus, the specified power of the test pertains to testing H00 and is not directly related to the procedure using the confidence interval. The expected sample sizes under H0 and H1 depend on the specified value of the power and pertain to the null hypothesis H00 and the corresponding alternative hypothesis H10 : η 6= 0 or a corresponding one-sided alternative. These expected sample sizes are not directly applicable to the equivalence problem of testing H0 against H1 . Next click on the Boundary Info tab. The repeated confidence intervals for η depend on the choice of spending function boundaries. The sample size for this group sequential study also depends on the choice of the spending function, as well as the choice of the power. Although the boundaries themselves are not used in the decision rule, the width of the repeated confidence intervals for η are determined by the choice of the spending function. Here we will use the Lan- DeMets (O’Brien-Fleming) 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 541 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample stopping boundary, with the looks spaced equally apart, as shown below. Click Compute. With Des4 selected in the Output Preview, click the icon. In the Library, select the rows for Des1 and Des4, by holding the Ctrl key, and then click icon. The upper pane will display the summary details of the two designs the side-by-side: We see that the extension of Des1 to a five-look design requires a commitment of 1233 subjects, a small inflation over the sample size of 1203 subjects required for Des1. 542 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Design4 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Design4 and select Create IM Dashboard. This will invoke the interim monitoring worksheet, from which the repeated 95% confidence intervals will be provided. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. We want to perform up to five looks, as data becomes available for every 200 subjects. Suppose that, after 200 subjects, π̂cj = 18/100 = 0.18 and π̂tj = 20/100 = 0.2. Then, from (25.2) and (25.4), the estimates of δ and the standard error of δ̂ are 0.02 icon to invoke the Test Statistic Calculator. Enter the and 0.0555. Click on the appropriate values as shown below and click Recalc. Notice that the test statistic is 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 543 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample computed to be 0.357. Next click OK . The following screen is shown. The first repeated 95% confidence interval for η is (-12.628, 14.402). Since this confidence interval is not contained in the interval (-3.357, 3.357), where η01 = δ0 1/2 t1 σ̂1 = 0.075 = 3.357, (0.162)1/2 (0.0555) we take a second look after 400 subjects. Click on the second row in the table in the upper section. Then click the icon to invoke the Test Statistic Calculator. Suppose that π̂cj = 36/200 = 0.18 and π̂tj = 38/200 = 0.19. Then, from (25.2) and (25.4), the estimates of δ and the standard error of δ̂ are 0.01 and 0.0388. Enter these 544 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 values as shown below and click on the Recalc button. Click on the OK button and the following values are presented in the interim monitoring worksheet. The second repeated 95% confidence interval for η is (-6.159, 7.064) is not contained in the interval (-3.396, 3.396), where η02 = δ0 1/2 t2 σ̂2 = 0.075 = 3.396, (0.324)1/2 (0.0388) so we cannot conclude equivalence. Continue the study and we take a third look after 600 subjects. Click on the third row in the table in the upper section. Then click the icon to invoke the Test Statistic Calculator. Suppose that π̂cj = 51/300 = 0.17 and π̂tj = 60/300 = 0.2. Then, from (25.2) and (25.4), the estimates of δ and the standard error of δ̂ are 0.03 and 0.0317. Enter these values as 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 545 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample shown below and click on the Recalc button. The following screen is shown. Click on the OK button and the following values are presented in the interim monitoring worksheet. The third repeated 95% confidence interval for η is (-2.965, 5.679) is not contained in the interval (-3.390, 3.390), where η03 = δ0 1/2 t3 σ̂3 = 0.075 = 3.390, (0.487)1/2 (0.0317) so we cannot conclude equivalence. Continue the study and we take a fourth look after 850 subjects. Click on the fourth row in the table in the upper section. Then click the icon to invoke the Test Statistic Calculator. Suppose that π̂cj = 91/450 = 0.2022 and π̂tj = 88/450 = 0.1956. Then, from (25.2) and (25.4), 546 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the estimates of δ and the standard estimate of δ are -0.007 and 0.027. Enter these values as shown below and click on the Recalc button. The following screen is shown. Click on the OK button and the following values are presented in the interim monitoring worksheet. The fourth confidence interval is (-3.302, 2.678) is entirely contained in the interval (-3.346, 3.346), where η04 = δ0 1/2 t4 σ̂4 = 0.075 = 3.346 (0.689)1/2 (0.027) and thus we conclude that the two treatments are equivalent. To express the results in terms of the δ, the final confidence interval for η can be transformed to a confidence interval for δ by multiplying the confidence limits by 1/2 t4 σ̂4 = (0.689)1/2 (0.027) = 0.0224, 25.1 Equivalence Test – 25.1.2 Extension to Multiple Looks 547 <<< Contents 25 * Index >>> Binomial Equivalence Two-Sample resulting in a confidence interval for δ of (-0.074, 0.060), which is entirely contained within the interval (-0.075, 0.075). 548 25.1 Equivalence Test <<< Contents * Index >>> 26 26.1 Chi-Square for Specified Proportions in C Categories 26.1.1 Trial Design Binomial Superiority n-Sample Let π0i and π1i for i = 1, 2, ..., C denote the response proportions under null and alternative hypotheses respectively where C denotes the number of categories. The null hypothesis states that the observed frequencies follow multinomial distribution with null proportions as probabilities. The test is performed for only two sided alternative. The sample size, or power, is determined for a specified value of the proportions which is consistent with the alternative hypothesis, denoted by π1i . Table 26.1: Table: Contingency Table Categoris\Response Age Group A Age Group B Age Group C Marginal Cured n11 n12 n13 n1. Not Cured n21 n22 n23 n2. The null hypothesis is H0 : πi = π0i , i = 1, 2, 3, ..., C and is tested against two-sided alternative. The test statistic is given as, χ2 = X (n1i − µi )2 i µi (26.1) where µi = n1 π0i Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately Chi-squared distribution with d.f. C − 1. The p-value is approximated by P (χ2c−1 ≥ χ20 ), where χ2c−1 denotes a Chi-squared random variable with d.f. = C − 1. 26.1.1 Trial Design Consider the design of a single-arm trial with binary response - Cured and Not Cured. The responses for Cured population for three categories are of interest - Age group A, Age group B and Age group C. We wish to determine whether the proportion of cured in the three age groups are 0.25, 0.25, and 0.50 respectively. Thus it is desired to test H0 : πA = 0.25, πB = 0.25, πC = 0.50. We wish to design the trial with a two-sided 26.1 Chi-Square-C categories – 26.1.1 Trial Design 549 <<< Contents 26 * Index >>> Binomial Superiority n-Sample test that achieves 90% power at H1 : πA = 0.3, πB = 0.4, πC = 0.3 at level of significance 0.05. Start East. Click Design tab, then click Many Samples in the Discrete group, and then click Chi-Square Test of Specified Proportions in C Categories . In the upper pane of this window is the Input dialog box, which displays default input values. Enter the Number of Categories (C) as 3. Under Table of Proportion of Response, enter the values of proportions under Null Hypothesis and Alternative Hypothesis for each category except the last one such that the sum of values in a row equals to 1. Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 71 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square Test of Specified Proportions in C Categories study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear 550 26.1 Chi-Square-C categories – 26.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 26.1 Chi-Square-C categories – 26.1.1 Trial Design 551 <<< Contents 26 26.2 * Index >>> Binomial Superiority n-Sample Two-Group Chi-square for Proportions in C Categories Let π1j and π2j denote the response proportions of group 1 and group 2 respectively for the j-th category, where j = 1, 2, ..., C. The null hypothesis H0 : π1j = π2j ∀j = 1, 2, ..., C is tested against the alternative hypothesis that for at least one j, π1j differs from π2j . 26.2.1 Trial Design Table 26.2: Table: Contingency Table Categories \ Groups A B C Marginal 552 26.2 Two-Group Chi-square Test Group 1 n11 n12 n13 n10 Group 2 n21 n22 n23 n20 Marginal n01 n02 n03 n <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The test statistic is given as, χ2 = where µij = noj nio ,j n X (nij − µij )2 µi j ij (26.2) = 1, 2, ..., C and i = 1, 2. Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately Chi-squared distribution with d.f. C − 1. The p-value is approximated by P (χ2C−1 ≥ χ20 ), where χ2C−1 denotes a Chi-squared random variable with d.f. = C − 1. 26.2.1 Trial Design Suppose researchers want to investigate the relationship between different dose levels (level 1, level 2 and level 3) of a drug and the type adverse events (serious or not serious). The proportions who were treated with different dose levels will be compared using a Chi-square test. Suppose the expected proportions of patients for three different dose levels are 0.30, 0.35 and 0.35 where patients had no serious adverse events and the expected proportions are 0.20, 0.30 and 0.50 where patients had serious adverse events. We wish to design the trial with a two-sided test that achieves 90% power at level of significance 0.05. Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickTwo-Group Chi-square for Proportions in C Categories. The Input dialog box, with default input values will appear in the upper pane. Enter the Number of Categories (C) as 3. Under Table of Proportion of Response, enter the values of proportions under Control and Treatment for each category except the last one such that the sum of values in a row equals to 1. Enter the inputs as shown below and click Compute. 26.2 Two-Group Chi-square Test – 26.2.1 Trial Design 553 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 503 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square Test of Specified Proportions in C Categories study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click n icon, some of the design details will be displayed in the upper pane. icon to save this design to Wbk1 in the In the Output Preview toolbar, click Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 554 26.2 Two-Group Chi-square Test – 26.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 26.2 Two-Group Chi-square Test – 26.2.1 Trial Design 555 <<< Contents 26 26.3 * Index >>> Binomial Superiority n-Sample Nonparametric: Wilcoxon Rank Sum for Ordered Categorical Data 26.3.1 Trial Design 556 When we compare two treatments with respect to signs and symptoms associated with a disease, we may base the comparison on a variable that assesses degree of response or the degree of severity, using an ordinal categorical variable. For example, investigators may report the severity of an adverse event, or other abnormality, using a specified grading system or using a simple scale, such as”none”, ”mild”, moderate”, or ”severe”. The latter rating scale might be used in an analgesia study to report the severity of pain. Although this four-point scale is often used and intuitively appealing, additional categories, such as ”very mild” and ”very severe”, may be added. In other situations, the efficacy of the treatment is best assessed by the subject reporting response to therapy using a similar scale. The Wilcoxon test for ordered categories is a nonparametric test for use in such situations. East provides the power for a specified sample size for a single-look design using the constant proportional odds ratio model. Let πcj and πtj denote the probabilities for category j, j = 1, 2, ..., J for the control c Pi Pi and the treatment t respectively. Let γci = j=1 πcj and γti = j=1 πtj . We assume that γci ψ γti 1−γci = e 1−γti , i = 1, 2, .., J − 1, 26.3 NPAR:Wilcoxon Rank Sum Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 or, equivalently, ψ = ln(γci ) − ln(1 − γci ) − (ln(γti ) − ln(1 − γti )) (26.3) We compare the two distributions by focusing on the parameter ψ. Thus we test the null hypothesis H0 : ψ = 0 against the two-sided alternative H1 : ψ 6= 0 or a one-sided alternative hypothesis H1 : ψ > 0. East requires the specified value of ψ to be positive. Technical details can be found in Rabbee et al.,2003. 26.3.1 Trial Design We consider here a placebo-controlled parallel-group study where subjects report the response to treatment as ”none”, ”slight” ”considerable”, or ”total”. We expect that most of the subjects in the placebo group will report no response. Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickNon Parametric: Wilcoxon Rank Sum for Ordered Categorical Data. The Input dialog box, with default input values will appear in the upper pane. We want to determine the power, using a two-sided test with a type-1 error rate of 0.05, with a total of 100 subjects, and equal sample sizes for the two groups. Enter Number of Categories as 4. We will use User Specified for Specify Pop 1 Probabilities and Proportional Odd Model for Pop2 Probabilities here. Click Proportional Odds Model radio button. A new field for Shift will appear. Enter 1.5 in this field. Based on the results of a pilot study, the values of 0.55, 0.3, 0.1, and 0.05 are used as Pop 1 probabilities. Enter the inputs as shown below and click Compute. 26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design 557 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed power highlighted in yellow. This design results in a power of approximately 98% for a total sample size of 100 subjects. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 558 26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des1 selected in the Library, click icon, on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. With such high power, a total sample size of 100 subjects may be an inefficient use of resources. We are willing to use a smaller sample size to achieve a lower power. Change the maximum sample size to 50 in the previous design. Leave all other values as defaults, and click Compute. This design results in approximately 80% power using a total sample size of 50 subjects. 26.3 NPAR:Wilcoxon Rank Sum Test – 26.3.1 Trial Design 559 <<< Contents 26 26.4 * Index >>> Binomial Superiority n-Sample Trend in R Ordered Binomial Proportions 26.4.1 Trial Design In some experimental situations, there are several binomial distributions indexed by an ordinal variable and we want to examine changes in the probabilities of success as the levels of the indexing variable changes. Examples of this include the examination of a dose-related presence of a response or a particular side effect, dose-related tumorgenicity, or presence of fetal malformations relative to levels of maternal exposure to a particular toxin, such as alcohol, tobacco, or environmental factors. The test for trend in R ordered proportions is based on the Cochran Armitage trend test. Let πj denote the probability of interest for the j-th category of the ordinal variable, j = 1, 2, ..., R and let scores be denoted by ω1 , ω2 , ...ωR . It is assumed that the odds ratio relating to j-th category to the (j − 1)-th category satisfies πj πj−1 = ψ ωj −ωj−1 1 − πj 1 − πj−1 (26.4) or equivalently, ln( πj−1 πj ) = (ωj − ωj−1 ) ln(ψ) + ln( ) 1 − πj 1 − πj−1 (26.5) This assumption can also be equivalently expressed as a relationship between the odds ratio for the j -th category to that of the first category; namely, πj π1 = ψ ωj −ω1 1 − πj 1 − π1 (26.6) or equivalently, ln( 560 πj π1 ) = (ωj − ω1 ) ln(ψ) + ln( ) 1 − πj 1 − π1 26.4 Trend in R Ordered Binomial Proportions (26.7) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 It is assumed that π1 < ... < πR with ψ > 1 or π1 > ... > πR with ψ < 1. We want to test the null hypothesis H0 : ψ = 1 against the two sided alternative H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ > 1 or H1 : ψ < 1. The sample size required to achieve a specified power or the power for a specified sample size is determined for a single-look design with the specified parameters. The sample size calculation is conducted using the methodology presented below, which is similar to that described in Nam, 1987. Let nj = rj N denote the sample size for the j-th category where rj is the j-th sample fraction and N is the total sample size. The determination of the sample size required to control the power of the test of H0 is based on W = R X rj (ωj − ω̄)πˆj (26.8) j=1 with ω̄ = PR j=1 rj ωj The expected value of W is R X rj (ωj − ω̄)πj (26.9) rj (ωj − ω̄)2 πj (1 − πj ) (26.10) E(W ) = j=1 and the variance of W is V (W ) = R X j=1 The expected value of W under H0 is E0 (W ) = π R X rj (ωj − ω̄) (26.11) j=1 and the variance of W under H0 is V0 (W ) = π(1 − π) R X rj (ωj − ω̄)2 (26.12) j=1 26.4 Trend in R Ordered Binomial Proportions 561 <<< Contents 26 * Index >>> Binomial Superiority n-Sample Where, π= R X rj πj (26.13) j=1 The test statistic used to determine the sample size is Z= W − E0 (W ) (26.14) 1 V0 (W ) 2 The total sample size required for a two-sided test with type-1 error rate of α to have power 1 − β when ψ = ψ1 is 1 N= 1 [zα/2 V0 (W ) 2 + zβ V (W ) 2 ]2 E(W )2 (26.15) The total sample size required for a one-sided test with type-1 error rate of α to have power 1 − β when ψ = ψ1 is determined from (1.11) with α/2 replaced by α. 26.4.1 Trial Design Consider the problem of comparing three durations of therapy for a specific disorder. We want to have sufficiently large power when 10% of subjects with shorter duration, 25% of subjects with intermediate duration and 50% of subjects with extensive duration will respond by the end of therapy. These parameters result in an odds ratio of ψ = 3 or equivalently ln(ψ) = 1.1 . We would like to determine the sample size to achieve 90% power when ln(ψ) = 1.1 based on a two-sided test at significance level 0.05. Start East. Click Design tab, then click Many Samples in the Discrete group, and then click Trend in R Ordered Binomial Proportions. The Input dialog box, with default input values will appear in the upper pane. Response probabilities can be specified in one of the two ways, selected from Response Probabilities: (1) User Specified Probabilities or (2) Model Based Probabilities. User can specify probabilities for each population if he or she chooses User Specified Probabilities whereas Model Based Probabilities are based on logit 562 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 transformation. We will use Model Based Probabilities here. Under Response Probabilities, click Model Based Probabilities radio button. A new field for log of Common odds Ratio will appear. Enter 1.1 in this field. Enter 0.1 in Prop. of Response field. One can specify the Scores (W(i)) also in monotonically increasing order. We will use Equally Spaced here. Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 75 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this design. You can select this design by clicking anywhere on the row in the Output Preview. If you click on icon, some of the design details will be displayed in the upper pane. icon, to save this design to Wbk1 in the In the Output Preview toolbar, click Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design 563 <<< Contents 26 * Index >>> Binomial Superiority n-Sample With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 564 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The default specification of equally spaced scores is useful when the categories are ordinal, but not numerical. If the categories are numerical, such as doses of a therapy, then the numerical value will be more appropriate. Consider three doses of 10, 20, and 30. One must exhibit care in specification of log(ψ) when the differences between scores for adjacent categories are equal, but this common difference is not equal to one. Although the differences are equal, user defined scores must be used. If the common difference is equal to a positive value A, then equating log(ψ) to 1/A of that for the default of equally spaced scores, with a common difference of one, will provide identical results. With three doses of (Scores W(i)) of 10, 20, and 30 and and log of Common odds Ratio = 0.11, the results are the same as those shown above. This is shown in the following screenshot. 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design 565 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 75 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .11 and π1 = 0.1. Similarly, if the differences between scores for adjacent categories are not equal, user defined scores must be used. Consider three doses of 10, 20, and 50, with log of Common odds Ratio= 0.11. Change the scores (Scores W(i)) to 10, 20, and 50 in the previous design. This is shown in the following screenshot. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 16 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .11 and π1 = 0.1. Although, a small sample size is usually desirable, here it may be due to a value of π3 (= 0.90) which may be too large to be meaningful. Then the power should be controlled at a smaller value of log(ψ). Consider log(ψ) = 0.07. Change the log of Common odds Ratio value to 0.07 . This is shown in the following screenshot. 566 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 37 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .07 and π1 = 0.1. The trend test is particularly useful in situations where there are several categories. Consider now an example of a dose-ranging study to examine the safety of a therapy, with respect to the occurrence of a specified adverse event (AE), such as a dose-limiting toxicity (DLT). Six doses (1, 2, 4, 8, 12, 16) have been selected. It is expected that approximately 5% on the lowest dose will experience the AE. The study is to be designed to have power of 90% if approximately 20% on the highest dose experience the AE. This suggests that the study should be designed with log(ψ) approximately (log(0.20) − log(0.05))/15 = 0.092. Enter log of Common odds Ratio as 0.1 , Prop. Of Response as 0.05 and Number of Populations as 6. Enter the Scores W(i) as 1, 2, 4, 8, 12, and 16. Leave all other values as defaults, and click Compute. 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design 567 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 405 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis when log(ψ) = .1 and π1 = 0.05. This sample size may not be economically feasible, so we instead select the sample size to achieve a power of 80%. Selecting Power(1-β) as 0.8 yields the result shown in the following screen shot. This design requires a combined total of 298 subjects from all groups to attain 80% power when log(ψ) = 0.1 and π1 = 0.05. 568 26.4 Trend in R Ordered Binomial Proportions – 26.4.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 26.5 Chi-Square for R Unordered Binomial Proportions 26.5.1 Trial Design Let πij denote proportions of response in i-th group and j-th category with i = 1.2, ...., R and j = 1, 2 where R denotes the number of groups. The null hypothesis of equality of proportions in all groups for every category is tested against the alternative that at least one proportion is different across all groups for any category. The null hypothesis is defined as, H0 : πi1 = π0 ∀i The alternative is defined as, H1 : πi1 6= π0 for any i = 1, 2, ..., R Table 26.3: Table: R × 2 Contingency Table Rows Row 1 Row 2 · · Row R Col Total Col 1 n11 n21 · · nR1 n1 Col 2 n12 n22 · · nR2 n2 Row Total m1 m2 · · mR N The test statistic is given as, 2 χ = R X 2 X (nij − i=1 j=1 mi nj 2 N ) m i nj N (26.16) Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately Chi-squared distribution with d.f. R − 1. The p-value is approximated by P (χ2R−1 ≥ χ20 ), where χ2R−1 denotes a Chi-squared random variable with d.f. = R − 1. 26.5.1 Trial Design Consider a 3-arm trial with treatments A, B and C. The response is the reduction in 26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design 569 <<< Contents 26 * Index >>> Binomial Superiority n-Sample blood pressure (BP). From historical data it is known that the response rates of treatment A, B and C are 37.5%, 59% and 40% respectively. That is, out of 40 individuals under treatment A, 15 had a reduction in BP, out of 68 individuals under treatment B, 40 had a reduction in BP and out of 30 individuals under treatment C, 12 had a reduction in BP. Based on these data we can fill the entries in the table of proportions. Table 26.4: Table: Proportion of Response Groups\Categories: Treatment A Treatment B Treatment C Reduction in BP 0.375 0.59 0.4 No Reduction 0.625 0.41 0.6 Marginal 1 1 1 This can be posed as a two-sided testing problem for testing H0 : πA = πB = πC (= π0 , say) against H1 : πi 6= π0 (for at least any i = A, B, C) at 0.05 level. We wish to determine the sample size to have 90% power for the values displayed in the above table. Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickChi-Square Test for Unordered Binomial Proportions. The Input dialog box, with default input values will appear in the upper pane. Enter the values of Response Proportion in each group and Alloc.Ratio ri = ni /n1 where Alloc.Ratio ri = ni /n1 is the corresponding weights relative to the first group . Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 301 subjects must be enrolled in order to achieve 570 26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square test for R × 2 Table study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon to save this design to Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then 26.5 Chi-Square for R Unordered Binomial Proportions – 26.5.1 Trial Design 571 <<< Contents 26 * Index >>> Binomial Superiority n-Sample click Power vs Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 26.6 Chi-Square for R Unordered Multinomial Proportions Let πij denote the response proportion in i-th group and j-th category. The null hypothesis H0 : π1j = π2j = .... = πRj ∀j = 1, 2...C is tested against the alternative hypothesis that for at least one category, the response proportions in all groups are not same. The test statistic is given as, χ2 = R X C X (nij − i=1 j=1 mi nj 2 N ) m i nj N Let χ20 be the observed value of χ2 . For large samples, χ2 has approximately 572 26.6 Chi-square Test-RxC Table (26.17) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 26.5: Table: Contingency Table Rows Row 1 Row 2 · · Row R Col Total Col 1 n11 n21 · · nR1 n1 Col 2 n12 n22 · · nR2 n2 · · · · · · · · · · · · · · Col C n1C n2C · · nRC nC Row Total m1 m2 · · mR mN Chi-squared distribution with d.f. (R − 1)(C − 1). The p-value is approximated by P (χ2(R−1)(C−1) ≥ χ20 ), where χ2(R−1)(C−1) denotes a Chi-squared random variable with d.f. = (R − 1)(C − 1). 26.6.1 Trial Design Consider a 3-arm oncology trial with treatments A, B and C. The responses in 4 categories - CR (complete response), PR (partial response), SD (stable disease) and PD (disease progression) are of interest. We wish to determine whether the response proportion in each of the 4 categories is same for the three treatments. From historical data we get the following proportions for each category for the three treatments. Out of 100 patients, 30 were treated with treatment A, 35 were treated with treatment B and 35 were treated with treatment C. The response proportion information for each treatment is given below. Assuming equal allocation in each treatment arm, we wish to design a two-sided test which achieves 90% power at significance level 0.05. Table 26.6: Table: Contingency Table Categories \ Treatment CR PR SD PD Marginal Treatment A 0.019 0.001 0.328 0.652 1 Treatment B 0.158 0.145 0.154 0.543 1 Treatment C 0.128 0.006 0.003 0.863 1 Start East. Click Design tab, then click Many Samples in the Discrete group, and then clickChi-Square R Unordered Multinomial Proportions. 26.6 Chi-square Test-RxC Table – 26.6.1 Trial Design 573 <<< Contents 26 * Index >>> Binomial Superiority n-Sample The Input dialog box with default input values will appear in the upper pane of this window. Enter Number of Categories (C) as 4. Enter the values of Proportion of Response and ri = ni /n1 where ri = ni /n1 is the corresponding weights relative to the first group. Enter the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 69 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this Chi-Square Test of Comparing Proportions in R by C Table study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon, to save this design to Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. 574 26.6 Chi-square Test-RxC Table – 26.6.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 26.6 Chi-square Test-RxC Table 575 <<< Contents 26 576 * Index >>> Binomial Superiority n-Sample 26.6 Chi-square Test-RxC Table <<< Contents * Index >>> 27 Multiple Comparison Procedures for Discrete Data Sometime it might be the case that multiple treatment arms are compared with a placebo or control arm in one single trial on the basis of a primary endpoint that is binary. These objectives are formulated into a family of hypotheses. Formal statistical hypothesis tests can be performed to see if there is strong evidence to support clinical claims. Type I error is inflated when one considers the inferences together as a family. Failure to compensate for multiplicities can have adverse consequences. For example, a drug could be approved when actually it is not better than placebo. Multiple comparison (MC) procedures provides a guard against inflation of type I error due to multiple testing. The probability of making at least one type I error is known as family wise error rate (FWER). East supports following MC procedures based on binary endpoint. Procedure Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback Reference Bonferroni CE (1935, 1936) Sidak Z (1967) Benjamini Y and Hochberg Y ( 1997) Holm S (1979) Hochberg Y (1988) Hommel G (1988) Westfall PH and Krishen A (2001) Wiens B, Dmitrienko A (2005) In this chapter we explain how to design a study using a MC procedure. In East, one can calculate the power from the simulated data under different MC procedures. With the information on power, one can choose the right MC procedure that provides maximum power yet strongly maintains the FWER. MC procedures included in East strongly control FWER. Strong control of FWER refers to preserving the probability of incorrectly claiming at least one null hypothesis. To contrast strong control with weak control of FWER, the latter controls the FWER under the assumption that all hypotheses are true. 27.1 Bonferroni Procedure 27.1.1 Example: HIV Study Bonferroni procedure is described below with an example. Assume that there are k arms including the control where the treatments arms will be compared with placebo on the basis of a binary response variable X. Let ni be the 27.1 Bonferroni Procedure 577 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data Pk−1 number of subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total sample size and the arm 0 refers to control. Also, assume πi be the response probabilities in i-th arm. We are interested in the following hypotheses: For the right tailed test: Hi : πi − π0 ≤ 0 vs Ki : πi − π0 > 0 For the left tailed test: Hi : πi − π0 ≥ 0 vs Ki : πi − π0 < 0 For the global null hypothesis at least one of the Hi is rejected in favor of Ki after controlling for FWER. Here Hi and Ki refer to the null and alternative hypotheses, respectively, for comparison of i-th arm with the control arm. Let π̂i be the sample proportion for treatment arm i and π̂0 be the sample proportion for the control arm. For unpooled variance case, the test statistic to compare i-th arm with control (i.e., Hi vs Ki ) is defined as Ti = q π̂i − π̂0 1 ni π̂i (1 − π̂i ) + 1 n0 π̂0 (1 (i = 0, 2, · · · , k − 1) (27.1) − π̂0 ) For the pooled variance case, one need to replace π̂i and π̂0 by the pooled sample proportion π̂. Pooled sample proportion π̂ is defined as π̂ = ni π̂i + n0 π̂0 ni + n0 (i = 0, 2, · · · , k − 1) (27.2) Let ti be the observed value of Ti and these observed values for K − 1 treatment arms can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . For the right tailed test the marginal p-value for comparing the i-th arm with placebo is calculated as pi =P (Z > ti )=Φ(−ti ) and for left tailed test pi =P (Z < ti )=Φ(ti ), where Z is distributed as standard normal and Φ(·) is the the cumulative distribution function of a standard normal variable. Let p(1) ≤ p(2) ≤ · · · ≤ p(k−1) be the ordered p-values. East supports three single step MC procedures for comparing proportions- Bonferroni procedure, Sidak procedure and weighted Bonferroni procedure. For the Bonferroni α and the adjusted p-value is given as procedure, Hi is rejected if pi < k−1 min(1, (k − 1)pi ). 27.1.1 Example: HIV Study This is a randomized, double-blind, parallel-group, placebo-controlled, multi-center study to assess the efficacy and safety of 125mg, 250 mg, and 500 mg orally twice daily of a new drug for a treatment of HIV associated diarrhea. The primary efficacy endpoint is clinical response, defined as two or less watery bowel movements per 578 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 week, during at least two of the four weeks of the 4-week efficacy assessment period. The efficacy will be evaluated by comparing the proportion of responders in the placebo group to the proportion of responders in the three treatment groups at a one-sided alpha of 0.025. The estimated response rate in placebo group is 35%. The response rates in the treatment groups are expected to be 40% for 125mg, 45% for 250mg and 55% for 500 mg. Dose (mg) Placebo 125 250 500 Estimated proportion 0.35 0.40 0.45 0.55 With the above underlying scenario, we would like to calculate the power for a total sample size of 500. This will be a balanced study with a one-sided 0.025 significance level to detect at least one dose with significant difference from placebo. We will show how to simulate the power of such a study using the multiple comparison procedures listed above. Designing the Study Start East. Click Design tab, then click Many Samples in the Discrete group, and then click Single Look under Multiple Pairwise Comparisons to Control - Differences of Proportions. This will launch a new window which asks the user to specify the values of a few design parameters including the number of arms, overall type I error, total sample size and multiple comparison procedure. For our example, we have 3 treatment groups plus a placebo. So enter 4 for Number of Arms. Under the Test Parameters tab, there are several fields which we will fill in. First, there is a box with the label Test Type. Here you need to specify whether you want a one-sided or two-sided test. Currently, only one-sided tests are available. The next dropdown box has the label Rejection Region. If left tail is selected, the critical value for the test is located in the left tail of the distribution of the test statistic. Likewise, if right tail is selected the critical value for the test is located in the right tail of the distribution of the test statistic. For our example, we will select Right Tail. Under that, there is a box with the label Type 1 Error (α). This is where you need to specify the FWER. For our example, enter 0.025. Now go to the box with the label Sample Size (n). Here we input the total number of subjects, including those in the placebo arm. For this example, enter 500. To the right, there will be a heading with the title Multiple Comparison Procedures. 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 579 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data Check the box next to Bonferroni, as this is the multiple comparison procedure we are illustrating in this subsection. After entering these parameters your screen should now look like this: Now click on Response Generation tab. You will see a table titled Table of Proportions. In this table we can specify the labels for treatment arms. Also you have to specify the dose level if you want to generate proportions through dose-response curve. There are two fields in this tab above the table. The first one is labeled as Variance and this has drop down list with two options - Pooled and Unpooled. Here you have to select whether you are considering pooled variance or unpooled variance for the calculation of test statistics for each test. For this example, select Unpooled for Variance. Next to the Variance there is check box labeled Generate Proportions Through DR Curve. If you want to generate response rate for each arm according to dose-response 580 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 curve, you need to check this box. Check the box Generate Proportions Through DR Curve. Once you check this box you will notice two things. First, an additional column with label Dose will appear in the table. Here you need to enter the dose levels for each arm. For this example, enter 0, 125, 250 and 500 for Placebo, Dose1, Dose2 and Dose3 arms, respectively. Secondly, you will notice an additional section will appear to the right which provides the option to generate the response rate from four families of parametric curves which are Four Parameter Logistic, Emax, Linear and Quadratic. The technical details about each curve can be found in the Appendix H. Here you need to choose the appropriate parametric curve from the drop-down list under Dose Response Curve and then you have to specify the parameters associated with these curves. Suppose the response rate follows the following four parameter logistic curve: δ E(π|D) = β + (27.3) 1 + exp( θ−D τ ) where D indicates dose. The parameter for the logistic dose-response curve should be chosen with care. We want to parameterize the above logistic model such that the proportions from logistic model agrees as close as possible to the estimated proportions stated at the beginning of the example. We will consider a situation where the response rate at dose 0 is very close to the parameter β. In other words, β indicates the placebo effect. For this to hold, 1+exp(δ θ−D ) should be very close to 0 at D = 0. τ For now, assume that it holds and we will return to this later. We have assumed 35% response rate in placebo arm. Therefore, we specify β as 0.35. The parameter β + δ indicates the maximum response rate. Since the response rate cannot exceed 1, δ should be chosen such a way that β + δ ≤ 1. The situation where the 100% response rate can never be achieved, δ would be even less. For this example, the response rate for the highest dose of 550 mg is 55%. Therefore, we assume that maximum response rate with the new drug could be achieved as only 60%. Therefore, we specify the δ as 0.60 - 0.35 or 0.25. The parameter θ indicates the median dose that can produce 50% of maximum improvement in response rate or a response that is equal to β + 2δ . With β = 0.35 and δ = 0.25, β + 2δ is 0.475. Note that we have assumed the dose 250 mg can provide response rate of 45%. Therefore, we assume θ as 300. τ need to be selected in such a way that 1+exp(δ θ−D ) should be very close to 0 at D = 0. We can τ assure this condition by choosing any small value of τ . However, a very small τ is an indicator of sharp improvement in response rate around the median dose and negligible improvement for almost other doses. In the HIV example, the estimated response rates indicate improvement in all the dose levels. With τ as 75, 1+exp(δ θ−D ) is 0.0045 and τ the proportions from the logistic regression are close to the estimated proportions for the chosen doses. Therefore, β = 0.35, δ = 0.25, θ = 300 and τ = 75 seems to be a reasonable for our example. 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 581 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data Select Four Parameter Logistic from drop-down list of Dose Response Curve. To the right of this dropdown box, Now we need to specify the 4 parameter values in the Parameters box. Enter 0.35 for β, 0.25 for δ, 250 for θ and 75 for τ . You can verify that the values in Response Rate column is changed to 0.359, 0.39, 0.475 and 0.591 for the four arms, respectively. These proportions are very close to the estimated proportions stated at the beginning of the example. Now click Plot DR Curve located below the parameters to see the dose-response curve. You will see the logistic dose response curve that intersects the Y-axis at 0.359. Close this plot. Since the response rates from logistic regression is close but not exactly 582 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 similar to the estimated proportions stated at the beginning of the example. Therefore, we will specify directly the estimated response rates in the Table of Proportions. In order to do this first uncheck Generate Proportions Through DR Curve. You will notice two things. First, the column with label Dose will disappear in the table. Second, the section in right will disappear as well. Now enter the estimated proportions in the Response Rate column. Enter 0.35, 0.40, 0.45 and 0.55 in this column. Now the Response Generation tab should appear as below. Click on the Include Options button located in the right-upper corner in the Simulation window and check Randomized. This will add Randomization tab. Now click on the Randomization tab. Second column of the Table of Allocation table displays the allocation ratio of each treatment arm to that of control arm. The cell for the control arm is always one and is not editable. Only those cells for treatment arms other than control need to be filled in. The default value for each treatment arm is one which represents a balanced design. For the HIV study example, we consider a balanced design and leave the default values for the allocation ratios unchanged. Your 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 583 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data screen should now look like this: The last tab is Simulation Control. Specify 10000 as Number of Simulations and 1000 as Refresh Frequency in this tab. The box labeled Random Number Seed is where you can set the seed for the random number generator. You can either use the clock as the seed or choose a fixed seed (in order to replicate past simulations). The default is the clock and we will use that. The box besides that is labeled Output Options. This is where you can choose to save summary statistics for each simulation run and/or to save the subject level data for a specific number of simulation runs. To save the output for each simulation, check the box with label Save summary statistics for every simulation run. Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim1. 584 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select Sim1 in the Output Preview and click icon. Now double-click on Sim1 in the Library. The simulation output details will be displayed in the right pane. The first section in the output is the Hypothesis section. In our situation, we are testing 3 hypotheses. We are comparing the estimated response rate of each dose group with that of placebo. That is, we are testing the 3 hypotheses: H1 :π1 = π0 vs K1 :π1 > π0 H2 :π2 = π0 vs K2 :π2 > π0 H3 :π3 = π0 vs K3 :π3 > π0 Here, π0 , π1 , π2 and π3 represent the population response rate for the placebo, 125 mg, 250 mg and 500 mg dose groups, respectively. Also, Hi and Ki are the null and alternative hypotheses, respectively, for the i-th test. The Input Parameters section provides the design parameters that we specified earlier. The next section Overall Power gives us estimated power based on the simulation. The second line gives us the global power, which is 0.807. Global power indicates the power to reject global null H0 :µ1 = µ2 = µ3 = µ0 . Thus, the global power of 0.807 indicates that 80.7% of times the global null will be rejected. In other words, at least one of the H1 , H2 and H3 is rejected in 81.2% of the occasions. Global 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 585 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data power is useful to show the existence of dose-response relationship and the dose-response may be claimed if any of the doses in the study is significantly different from placebo. The next line displays the conjunctive power. Conjunctive power indicates the proportion of cases in the simulation where all the Hi ’s, which are truly false, were rejected. In this example, all the Hi ’s are false. Therefore, for this example, conjunctive power is the proportion of cases where all of the H1 , H2 and H3 were rejected. For this simulation conjunctive power is only 0.035 which means that only in 3.5% of time, all of the H1 , H2 and H3 were rejected. Disjunctive power indicates the proportion of rejecting at least one of those Hi ’s where Hi is truly false. The main distinction between global and distinctive power is that the former finds any rejection whereas the latter looks for rejection only among those Hi ’s which are false. Since here all of the H1 , H2 and H3 are false, therefore, global and disjunctive power ought to be the same. The next section gives us the marginal power for each hypothesis. Marginal power finds the proportion of times when a particular hypothesis is rejected. Based on simulation results, H1 is rejected about 6% of times, H2 is rejected about 22% of times and H3 is rejected about 80% of times. Recall that we have asked East to save the simulation results for each simulation run—. Open this file by clicking on SummaryStat in the library and you will see that it contains 10,000 rows - each rows represents results for a single simulation. Find the 3 columns with labels Rej Flag 1, Rej Flag 2 and Rej Flag 3, respectively. These columns represents the rejection status for H1 , H2 and H3 , respectively. A value of 1 is indicator of rejection on that particular simulation, otherwise the null is not rejected. Now the proportion of 1’s in Rej Flag 1 indicates the marginal power to reject H1 . Similarly we can find out the marginal power for H2 and H3 from Rej Flag 2 and Rej Flag 3, respectively. To obtain the global and disjunctive power, count the total number of cases where at least one of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. Similarly, to obtain the conjunctive power count the total number of cases where all of the H1 , H2 and H3 have been rejected and then divide by the total number of simulations of 10,000. Next we will consider an example to show how global and disjunctive power are different from each other. Select Sim 1 in Library and click . Now go to the the Response Generation tab and enter 0.35, 0.35, 0.38 and 0.42 in the 4 cells in second 586 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 column labeled as Response Rate. Here we are generating response for placebo from distribution Bin(125, 0.35), for Dose1 from distribution Bin(125, 0.35), for Dose2 from distribution Bin(125, 0.38) and for Dose3 from distribution Bin(125, 0.42). Click Simulate to start the simulation. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. For Sim 2, the global power and disjunctive power are close to 12%. To understand why, click on SummaryStat in the library for Sim 2. The total number of cases where at least one of H1 , H2 and H3 are rejected is about 1270 and dividing this by total number of simulation 10,000 gives the global power of 12.7%. Again, the total number of cases where at least one of H2 and H3 are rejected is close to1230 and dividing this by total number of simulation 10,000 gives the disjunctive power of 12.3%. The exact result of the simulations may differ slightly, depending on the seed. Now, delete the Sim 2 from the Output Preview because we have modified the design in HIV example to explain the difference between global power and disjunctive power. In order to do this, select row corresponding to Sim 2 in Output Preview and click in the toolbar. 27.1 Bonferroni Procedure – 27.1.1 Example: HIV Study 587 <<< Contents 27 27.2 * Index >>> Multiple Comparison Procedures for Discrete Data Weighted Bonferroni procedure In this section we will cover the weighted Bonferroni procedure with the same HIV example. For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the proportion of α allocated to the Pk−1 1 Hi such that i=1 wi = 1. Note that, if wi = k−1 , then the Bonferroni procedure is reduced to the regular Bonferroni procedure. Since the other design specifications remain same except that we are using weighted Bonferroni procedure in place of Bonferroni procedure, we can design simulation in this section with only little effort. Select Sim 1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Weighted Bonferroni box. Next click on Response Generation tab and look at the Table of Proportions. You will see an additional column with label Proportion of Alpha is added. Here you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default, East distributes the total alpha equally among all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as well. For this example, keep the equal 588 27.2 Weighted Bonferroni procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 proportion of alpha for each test. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim 2. The weighted Bonferroni MC procedure has global and disjunctive power of 81% and conjunctive power of 3.4%. Note that, the powers in the weighted Bonferroni procedure is quite close to the Bonferroni procedure. This is because the weighted Bonferroni procedure with equal proportion is equivalent to the simple Bonferroni procedure. The difference in power between Bonferroni test in previous section and the weighted Bonferroni power in this section attributed to simulation error. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim2 in the Output Preview and click the Library. 27.2 Weighted Bonferroni procedure icon. This will save Sim2 in Wbk1 in 589 <<< Contents 27 27.3 * Index >>> Multiple Comparison Procedures for Discrete Data Sidak procedures Sidak procedures are described below using the same HIV example from the 1 section 27.1. For the Sidak procedure, Hi is rejected if pi < 1− (1 − α) k−1 and the adjusted p-value is given as 1 − (1 − pi )k−1 . . Now go to the Test Parameters tab. In the Select Sim1 in Library and click Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Sidak box. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim3. Sidak procedure has disjunctive and global powers of 81% and conjunctive powers of 3.8%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim 3 in the Output Preview using the Ctrl key and click the This will save Sim 3 in the Wbk1 in Library. 27.4 590 Holm’s step-down procedure icon. In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures decision of one hypothesis test can influence the decisions on the other tests of hypotheses. There are two types of stepwise procedures. One type of procedures proceeds in data-driven order. The other type proceeds in a fixed order set a priori. Stepwise tests in a data-driven order can proceed in step-down or step-up manner. East supports Holm step down MC procedure which start with the most significant comparison and continue as long as tests are significant until the test for 27.4 Holm’s step-down procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 certain hypothesis fails. The testing procedure stops at the first time a non-significant comparison occurs and all remaining hypotheses will be retained. In i-th step, H(i) is α rejected if p(i) ≤ k−i and goes to the next step. Holm’s step down As before we will use the same HIV example to illustrate Holm’s step down procedure. Select Sim1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Holm’s Step down box. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim4. Holm’s step down procedure has global and disjunctive power close to 81% and conjunctive power close to 9%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim4 in the Output Preview and click the icon. This will save Sim4 in Wbk1 in Library. 27.5 Hocheberg and Hommel procedures Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1 α for j = 1, · · · , i. Fixed i 27.5 Hocheberg and Hommel procedures 591 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data sequence test and fallback test are the types of tests which proceed in a prespecified order. Hochberg’s and Hommel’s step up procedures are described below using the same HIV example from the section 27.1 on Bonferroni procedure. Since the other design specifications remain same except that we are using Dunnett’s step down in place of single step Dunnett’s test we can design simulation in this section with only little effort. Select Sim1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Hochberg’s step up and Hommel’s step up boxes. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview labeled as Sim 5 and Sim 6. The Hocheberg and Hommel procedures have disjunctive and global powers of 81.2% and 81.4%, respectively and conjunctive powers close to 10%. The exact result of the simulations may differ slightly, depending on the seed. Now select Sim5 and Sim6 in the Output Preview using Ctrl key and click the Sim6 in Wbk1 in Library. 592 27.5 Hocheberg and Hommel procedures icon. This will save Sim5 and <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 27.6 Fixed-sequence testing procedure In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both of these procedures. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is prespecified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence test is that once a hypothesis is not rejected no further testing is permitted. This will lead to lower power to reject hypotheses tested later in the sequence. As before we will use the same HIV example to illustrate fixed sequence testing procedure. Select Sim 1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Bonferroni box and check the Fixed Sequence box. Next click on Response Generation tab and look at the Table of Proportions. You will see an additional column with label Test Sequence is added. Here you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default 27.6 Fixed-sequence testing procedure 593 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data which means that H1 will be tested first followed by H2 and finally H3 will be tested. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview labeled as Sim7. The fixed sequence procedure with the specified sequence has global and disjunctive power close to13% and conjunctive power close to 10%. The reason for small global and disjunctive power is due to the smallest treatment effect is tested first and the magnitude of treatment effect increases gradually for the remaining tests. For optimal power in fixed sequence procedure, the early tests in the sequence should have larger treatment effects. In our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Therefore, to obtain optimal power, H3 should be tested first followed by H2 and H1 . Select Sim7 in the Output Preview and click the Library, click 594 icon. Now, select Sim7 in and go to the the Response Generation tab. In Test Sequence 27.6 Fixed-sequence testing procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim8. Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) has global and disjunctive power close to 89% and conjunctive power of 9.7%. This example illustrates that fixed sequence procedure is powerful provided the hypotheses are tested in a sequence of descending treatment effects. Fixed sequence procedure controls the FWER because for each hypothesis, testing is conditional upon rejecting all hypotheses earlier in sequence. The exact result of the simulations may differ slightly, depending on the seed. Select Sim8 in the Output Preview and click the icon to save it in Library. 27.7 Fallback procedure Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence 27.7 Fallback procedure 595 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Again we will use the same HIV example to illustrate the fallback procedure. Select Sim 1 in Library and click . Now go to the the Test Parameters tab. In the Multiple Comparison Procedures box, uncheck the Dunnett’s single step box and check the Fallback box. Next click on Response Generation tab and look at the Table of Proportions. You will see two additional columns with label Test Sequence and Proportion of Alpha. In the column Test Sequence, you have to specify the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For now we will keep the default which means that H1 will be tested first followed by H2 and finally H3 will be tested. In the column Proportions of Alpha, you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default East distributes the total alpha equally among the all tests. Here we have 3 tests in total, therefore each of the tests have proportion of alpha as 1/3 or 0.333. You can specify other proportions as 596 27.7 Fallback procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 well. For this example, keep the equal proportion of alpha for each test. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim9. The fixed sequence procedure with the specified sequence had global and disjunctive power close to 13% and conjunctive power of 9%. With the same pre-specified order for testing hypotheses, fallback procedure has superior power compared to fixed sequence procedure. This is because the fallback procedure can continue testing even if a non-significant outcome is encountered whereas the fixed sequence procedure has to stop when a hypothesis in the sequence is not rejected. Now we will consider a sequence where H3 will be tested first followed by H2 and H1 because in our case, Dose3 has largest treatment effect followed by Dose2 and Dose1. Select Sim 9 in the Output Previewand click the in Library, click icon. Now, select Simulation 9 and go to the the Response Generation tab. In Test Sequence 27.7 Fallback procedure 597 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data column in the table, specify 3 for Dose1, 2 for Dose2 and 1 for Dose3. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional rows to the Output Preview labeled as Sim 10. Now the fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) had global and disjunctive power of 89% and conjunctive power of 9.7%. The obtained power is very close to Sim 9. Therefore, specification of sequence in descending treatment effect does not make much difference in terms of power. The exact result of 598 27.7 Fallback procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the simulations may differ slightly, depending on the seed. Select Sim10 in the Output Previewand click the 27.8 Comparison of MC procedures icon to save it in Library. We have obtained the power (based on the simulations) for different MC procedures for the HIV example in the previous sections. Now the obvious question is which MC procedure to choose. To compare all the MC procedure, we will perform simulations for all the MC procedures under the following scenario. Treatment arms: placebo, dose1 (dose=0.3 mg), dose2 (dose=1 mg) and dose3 (dose=2 mg) with respective proportions as 0.35, 0.4, 0.45 and 0.55, respectively. Variance: Unpooled Proportion of Alpha: Equal (0.333) Type I Error: 0.025 (right-tailed) Number of Simulations:10000 Total Sample Size:500 Allocation ratio: 1 : 1 : 1 : 1 For comparability of simulation results, we have used similar seed for simulation under all MC procedures (we have used seed as 5643). Following output displays the powers under different MC procedures. Clean up the Output Preview area, select all the checkboxes corresponding to the procedures and hit Simulate. Here we have used equal proportions for weighted Bonferroni and Fallback procedures. For the two fixed sequence testing procedures (fixed sequence and fallback) two sequences have been used - (H1 , H2 , H3 ) and (H3 , H2 , H1 ). As expected, Bonferroni and weighted Bonferroni procedures provides similar powers. It appears that fixed sequence procedure with the pre-specified sequence (H3 , H2 , H1 ) provides the power of 89.5% which is the maximum among all the procedures. However, fixed sequence procedure with the pre-specified sequence (H1 , H2 , H3 ) provides power of 13.6%. Therefore, power in fixed sequence procedure is largely 27.8 Comparison of MC procedures 599 <<< Contents 27 * Index >>> Multiple Comparison Procedures for Discrete Data dependent on the specification of sequence of testing and a mis-specification might result in huge drop in power. All the remaining remaining procedures have almost equal global and disjunctive powers - about 82%. Now, in terms of conjunctive power, Hochberg’s step-up and Hommel’s step-up procedures have the highest conjunctive power of 9.9%. Therefore, we can choose either Hochberg’s step-up or Hommel’s step-up procedure for a prospective HIV study discussed in section 27.1. 600 27.8 Comparison of MC procedures <<< Contents * Index >>> 28 Multiple Endpoints-Gatekeeping Procedures for Discrete Data Clinical trials are often designed to assess benefits of a new treatment compared to a control treatment with respect to multiple clinical endpoints which are divided into hierarchically ordered families. Typically, the primary family of endpoints defines the overall outcome of the trial, provides the basis for regulatory claim and is included in the product label. The secondary families of endpoints play a supportive role and provide additional information for physicians, patients, payers and are useful for enhancing the product label. Gatekeeping procedures address multiplicity problems by explicitly taking into account the hierarchical structure of the multiple objectives. The term ”gatekeeping” indicates the hierarchical decision structure where the higher ranked families serve as ”gatekeepers” for the lower ranked family. Lower ranked families won’t be tested if the higher ranked families have not passed requirements. Two types of gatekeeping procedures for discrete outcomes, parallel and serial, are described in this chapter. For more information about applications of gatekeeping procedures in a clinical trial setting and literature review on this topic, please refer to Dmitrienko and Tamhane (2007). East uses simulations to assess the operating characteristics of different designs using gatekeeping procedures. For example, one could simulate the power for a variety of sample sizes in a simple batch procedure. It is important to note that when determining the sample size for a clinical trial with multiple co-primary endpoints, if the correlation among the endpoints is not taken into consideration, the sample size may be overestimated (Souza, et al 2010). East uses information about the correlation among the multiple endpoints in order to determine a more feasible sample size. 28.1 MK-0974 (telcagepant) Consider the randomized, placebo-controlled, double blind, parallel treatment clinical for Acute Migraine trial designed to compare two treatments for migraine, a common disease and leading cause of disability. Standard treatment includes the use of Triptans, which although generally well tolerated, have a vasoconstrictor effect, which can be problematic. This leaves a certain population of patients with underlying cardiovascular disease, uncontrolled hypertension or certain subtypes of migraine unable to access this treatment. In addition, for some patients this treatment has no or low beneficial effect and is associated with some undesirable side effects resulting in the discontinuation of the drug (Ho et al, 2008). In this study, multiple doses of the drug Telcagepant (300 mg, 150 mg), an antagonist of the CGRP receptor associated with migraine, and zolmitriptan (5mg) the standard treatment against migraine, are compared against a placebo. The five co-primary endpoints include pain freedom, pain relief, absence of photophobia (sensitivity to light), absence of phonophobia (sensitivity to sound), and absence of nausea two hours post treatment. Three co-secondary endpoints included more sustained measurements of pain freedom, pain relief, and total migraine freedom 28.1 MK-0974 (telcagepant) for Acute Migraine 601 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data for up to a 24 hour period. The study employed a full analysis set where the multiplicity of endpoints was addressed using a step-down closed testing procedure. Due to the negative aspects of zolmitriptan, investigators were primarily interested in determining the efficacy of Telcagepant for the acute treatment of migraine with the hope of an alternative treatment with fewer associated side effects. This study will be used to illustrate the two gatekeeping procedures East provides for multiple discrete endpoints. 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes Serial gatekeeping procedures were studied by Maurer, Hothorn and Lehmacher (1995), Bauer et al. (1998) and Westfall and Krishen (2001). Serial gatekeepers are encountered in trials where endpoints are usually ordered from most important to least important. Suppose that a trial is declared successful only if the treatment effect is demonstrated on both primary and secondary endpoints. If endpoints in the primary trial are successful, it is only then of interest to assess the secondary endpoints. Correlation coefficients between the endpoints are bounded and East computes the valid range of acceptable values. As the number of endpoints increases, the restriction imposed on the valid range of correlation values is also greater. Therefore for illustration purpose, the above trial is simplified to consider three primary endpoints, pain freedom (PF), absence of phonophobia (phono) and absence of photophobia (photo) at two hours post treatment. Only one endpoint from the secondary family, sustained pain freedom (SPF), will be included in the example. Additionally, where the original trial studied multiple doses and treatments, this example will use only two groups to focus the comparison on the higher dose of Telcagepant of 300mg, and placebo. The example includes correlation values intended to represent zero, mild and moderate correlation accordingly, to examine its effect on power. The efficacy, or response rate, of the endpoints for subjects in the treatment group and placebo group and a sample correlation matrix follows: 602 Response Telcagepant 300mg Response Placebo PF phono photo 0.269 0.578 0.51 0.096 0.368 0.289 SPF 0.202 0.05 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ρ12 ρ13 ρ23 ρ14 ρ24 ρ34 Sim 1 Sim 2 Sim 3 Sim 4 Sim 5 Sim 6 Sim 7 0 0 0 0 0.3 0.3 0.3 0 0 0 0 0.3 0.3 0.3 0 0.3 0.5 0.8 0.3 0.5 0.8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sim 8 Sim 9 Sim 10 Sim 11 Sim 12 Sim 13 Sim 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.3 0.3 0.3 0.8 0.8 0.8 0.8 0.3 0.5 0.7 0.3 0.5 0.7 0.7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 To construct the above simulations, in the Design tab on the Discrete group, click Two Samples and select Multiple Comparisons-Multiple Endpoints 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 603 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data At the top of this input window, the user must specify the total number of endpoints in the trial. Other input parameters such as Test Type, Type I Error (α), Sample Size (n), and whether or not a Common Rejection Region is to be used for the endpoints. If a different rejection region is desired for different endpoints, this information should be specified in the Endpoint Information box. Here the user can change the label, select the family rank for each endpoint and choose the rejection region (either right or left tailed). As discussed above there are typically two types of gatekeeping procedures - serial and parallel. Parallel gatekeeping requires the rejection of at least one hypothesis test - that is only one of the families of endpoints must be significant, no matter the rank. Serial gatekeeping uses the fact that the families are hierarchically ordered, and subsequent families are only tested if the previously ranked families are significant. Once the Gatekeeping Procedure is selected, the user must then select the multiple comparison procedure which will be used to test the last family of endpoints. These tests are discussed in Chapter 27. If Parallel Gatekeeping is selected, the user must also specify a test for Gatekeeper Families, specifically Bonferonni, Truncated Holm or Truncated Hochberg, and is discussed more in the Parallel example which follows. The type I error specified on this screen is the nominal level of the family-wise error rate, which is defined as the probability of falsely declaring the efficacy of the new treatment compared to control with respect to any endpoint. 604 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For the migraine example, PF, phono, and photo form the primary family, and SPF is the only outcome in the secondary family. Suppose that we would like to see the power for a sample size of 200 at a nominal type I error rate 0.025 using Bonferroni test for the secondary family. The input window will look as follows: In addition to the Test Parameters tab, there is a tab labeled Response Generation. This is where the user specifies the underlying joint distribution among the multiple endpoints for the control arm and for the treatment arm. This is assumed to be multivariate binary with a specified correlation matrix. For the first simulation, the 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 605 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Common Correlation box can be checked with default value of 0. The number of simulations to be performed and other simulation parameters can be specified in bf Simulation Controls window. By default, 10000 simulations will be performed. The summary statistics for each simulated trial and subject-level data can be saved by checking the appropriate boxes in the Output Options area. Once all design parameters are specified, click the Simulate button at the bottom right of the screen. Preliminary output is displayed in the output preview area and all results 606 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 displayed in the yellow cells are summary outputs generated from simulations. To view the detailed output, first save the simulation into a workbook in the library by selecting the simulation in the Output Preview window and clicking node will appear in the library under the current workbook. A simulation Double click the simulation node Sim1 in the Library to see the detailed output which summarizes all the main input parameters, including the multiple comparison procedure used for the last family of endpoints, the nominal type I error level, total sample size, mean values for each endpoint in the control arm and that in the experimental arm etc. It also displays a comprehensive list of different types of power: These different types of power are defined as follows: 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 607 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Overall Power and FWER: Global: probability of declaring significance on any of the endpoints Conjunctive: probability of declaring significance on all of the endpoints for which the treatment arm is truly better than the control arm Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error among all the endpoints Power and FWER for Individual Gatekeeper Family except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family Power and FWER for the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm FWER: probability of making at least one type I error when testing the endpoints in the last family 608 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Marginal Power: probability of declaring significance on the particular endpoint For the migraine example, the conjunctive power, which characterizes the power for the study, is 0.701% for a total sample size of 200. Using Bonferroni test for the last family, the design has 0.651% probability (disjunctive power for the last family) to 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes 609 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data detect the benefit of Telcagepant 300mg with respect to at least one secondary endpoints. It has 0.651% chance (conjunctive power for the last family) to declare the benefit of Telcagepant 300 mg with respect to both of the secondary endpoints. For a sample size of 200 this relatively low power is typically undesirable. One can find the sample size to achieve a target power by simulating multiple designs in a batch mode. For example, the simulation of a batch of designs for a range of sample size 200 to 300 in steps of 20 is shown by the following. Multiple designs can be viewed side by side for easy comparison by selecting the simulations and clicking the in the output preview area: For this example, to obtain a conjunctive power between 80% and 90% the study 610 28.2 Serial Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 would need to be constructed with somewhere between 250 and 300 subjects. For the remainder of this example, we will use sample size of 250 subjects under the correlation assumptions in the above table. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes A common concern in clinical trials with multiple primary endpoints, is whether or not statistical significance should be achieved on all endpoints. As the number of endpoints increases, this generally becomes more difficult. Parallel gatekeeping procedures are often used in clinical trials with multiple primary objectives where each individual objective can characterize a successful overall trial outcome. In other words, the trial can be declared to be successful if at least one primary objective is met. Again, consider the same randomized, placebo-controlled, double blind, parallel treatment clinical trial designed to compare two treatments for migraine presented in the serial gatekeeping example. For the purpose of this example the trial is again simplified to study only three primary family endpoints, pain freedom (PF), absence of phonophobia (phono) and absence of photophobia (photo) at two hours post treatment. The singular endpoint in the secondary family is sustained pain freedom (SPF), and will be included in the example where, using East, power estimates will be computed via simulation. The example correlation values are intended to represent a common and moderate association among the endpoints. In general, serial gatekeeping designs require a larger sample size than parallel designs, therefore this example will use a total sample size of 125, at one-sided significance level of α = 0.025. The efficacy, or response rate, of the endpoints for subjects in the treatment group and placebo group and a sample correlation matrix are as follows: Response Telcagepant 300mg Response Placebo PF phono photo 0.269 0.578 0.51 0.096 0.368 0.289 SPF 0.202 0.05 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 611 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Sim 1 Sim 2 Sim 3 ρ12 ρ13 ρ23 ρ14 ρ24 ρ34 0.3 0 0.3 0.3 0 0.3 0.3 0.8 0.8 0.3 0.3 0.3 0.3 0.0 0 0.3 0.0 0 We now construct a new set of simulations to assess the operating characteristics of the study using a Parallel Gatekeeping design for the above response generation information. In the Design tab on the Discrete group, click Two Samples and select Multiple Comparisons-Multiple Endpoints In the Gatekeeping Procedure box, keep the default of Parallel and Bonferroni for the Test for Gatekeeper Families. For the Test for Last Family, also ensure that Bonferroni is selected as the multiple testing procedure. In the Endpoint Information box, specify which family each specific endpoint belongs to using the 612 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 column with the label Family Rank. In the Response Generation window the Variance can be specified to be either Pooled or Un-pooled. In the Endpoint Information box, the Response Rates for treatment and control for each endpoint are specified. If the endpoints share a common correlation, select the Common Correlation checkbox and enter the correlation value to the right. East will only allow a value within the Valid Range. Otherwise input the specific correlation for each pair of endpoints in the Correlation Matrix. In the Simulation Controls window, the user can specify the total number of simulations, refresh frequency, and random number seed. Simulation data can be saved for more advanced analyses. After all the input parameter values have been specified, click the Simulate button on the bottom right of the window to begin the simulation. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 613 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data The progress window will report how many simulations have been completed. When complete, close the progress report screen and the preliminary simulation summary will be displayed in the output preview window. Here, one can see the overall power summary. 614 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To see the detailed output, save the simulation in the current workbook by clicking the icon. A simulation node will be appended to the corresponding workbook in the library. Double click the simulation node in the library to display the detailed outputs. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 615 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data As with serial gatekeeping, East provides following types of power: Overall Power and FWER: Global: probability of declaring significance on any of the endpoints. Conjunctive: probability of declaring significance on all of the endpoints for which the treatment arm is truly better than the control arm. Disjunctive: probability of declaring significance on any of the endpoints for which the treatment arm is truly better than the control arm. FWER: probability of making at least one type I error among all the endpoints. Power and FWER for Individual Gatekeeper Families except the Last Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm. Disjunctive Power: probability of declaring significance on any of the endpoints in the particular gatekeeper family for which the treatment arm is truly better than the control arm. FWER: probability of making at least one type I error when testing the endpoints in the particular gatekeeper family. Power and FWER for the Last Gatekeeper Family: Conjunctive Power: probability of declaring significance on all of the endpoints in the last family for which the treatment arm is truly better than the control arm. Disjunctive Power: probability of declaring significance on any of the endpoints in the last family for which the treatment arm is truly better than the control arm. FWER: probability of making at least one type I error when testing the endpoints in the last family. Marginal Power: probability of declaring significance on the particular endpoint. For the migraine example under the lower common correlation assumption, we see that the gatekeeping procedure using the Bonferroni test for both the primary family and the secondary family provides 84.4% power to detect the difference in at least one of the three primary measures of migraine relief. It only provides 24.1% power to detect the differences in all types of relief. The marginal power table displays the probabilities of declaring significance on the particular endpoint after multiplicity adjustment. For example, the power to detect sustained pain relief beyond 2 hours for a dose of 300 mg of telecapant is 60.3 To assess the robustness of this procedure with respect to the correlation among the 616 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 28.1: Power Comparisons under Different Correlation Assumptions Correlation Sim 1 Sim 2 Sim 3 Primary Family Disjunct. Conjunct. 0.839 0.838 0.787 0.242 0.244 0.286 Secondary Family Disjunct. Conjunct. 0.599 0.579 0.554 0.99 0.579 0.554 Overall Power Disjunct. Conjunct. 0.839 0.838 0.787 0.218 0.202 0.234 different endpoints, the simulation can be run again with different combinations of correlations. Right click on the simulation node in the Library and select Edit Simulation from the dropdown list. Next click on the Response Generation tab, update the correlation matrix, and click Simulate. This can be repeated for all desired correlation combinations and be compared in an output summary. The following table summarizes the power comparisons under different correlation assumptions. Note that the disjunctive power decreases as the correlation increases and conjunctive power increases as the correlation increases. There are three available parallel gatekeeping methods: Bonferroni, Truncated Holm and Truncated Hochberg. The multiple comparison procedures applied to the gatekeeper families need to satisfy the so-called separable condition. A multiple comparison procedure is separable if the type I error rate under partial null configuration is strictly less than the nominal level α. Bonferroni is a separable 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 617 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Table 28.2: Impact of Truncation Constant on Power in the Truncated Holm Procedure Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.234 0.28 0.315 0.383 0.84 0.833 0.836 0.838 Secondary Family Conjunct. Disjunct. 0.59 0.569 0.542 0.488 0.59 0.569 0.542 0.488 Overall Power Conjunct. Disjunct. 0.21 0.248 0.275 0.334 0.84 0.833 0.836 0.838 procedure, however, the regular Holm and Hochberg procedure are not separable and can’t be applied directly to the gatekeeper families. The truncated versions obtained by taking the convex combinations of the critical constants for the regular Holm/Hochberg procedure and Bonferroni procedure are separable and more powerful than Bonferroni test. The truncation constant leverages the degree of conservativeness. The larger value of the truncation constant results in more powerful procedure. If the truncation constant is set to be 1, it reduces to the regular Holm or Hochberg test. To see this, simulate the design using the truncated Holm procedure for the primary family and Bonferroni test for the second family for the migraine example with common correlation 0.3. The table below compares the conjunctive power and disjunctive power for each family and the overall ones for different truncation parameter values. As the value of the truncation parameter increases, the conjunctive power for the primary family increases and the disjunctive power remain unchanged. Both the conjunctive power and disjunctive power for the secondary family decrease as we increase the truncation parameter. The overall conjunctive power also increases but the overall disjunctive power remains the same with the increase of truncation parameter. The next table shows the marginal powers of this design for different truncation parameter values. The marginal powers for the two endpoints in the primary family increase. On the other hand, the marginal powers for the endpoint in the secondary family decrease. The last two tables display the operating characteristics for the Hochberg test with different truncation constant values. Note that both the conjunctive and disjunctive powers for the primary family increase as the truncation parameter increases. However, the power for the secondary family decreases with the larger truncation 618 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 28.3: Impact of Truncation Constant on Marginal Power in the Truncated Holm Procedure Truncation Constant 0 0.25 0.5 0.8 Primary Family PF Phono Photo 0.54 0.582 0.591 0.625 0.512 0.512 0.541 0.568 Secondary Family SPF 0.568 0.58 0.596 0.631 0.59 0.569 0.542 0.488 Table 28.4: Impact of Truncation Constant on Power in the Truncated Hochberg Procedure Truncation Constant 0 0.25 0.5 0.8 Primary Family Conjunct. Disjunct. 0.234 0.303 0.322 0.407 0.844 0.838 0.841 0.847 Secondary Family Conjunct. Disjunct. 0.595 0.578 0.544 0.494 0.595 0.578 0.544 0.494 Overall Power Conjunct. Disjunct. 0.208 0.268 0.281 0.351 0.844 0.838 0.841 0.847 parameter value. The marginal powers for the primary family and for the secondary family behave similarly. The overall conjunctive and disjunctive powers also increase as we increase the truncation parameter. 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes 619 <<< Contents 28 * Index >>> Multiple Endpoints-Gatekeeping Procedures for Discrete Data Table 28.5: Impact of Truncation Constant in Truncated Hochberg Procedure on Marginal Power Truncation Constant 0 0.25 0.5 0.8 620 Primary Family PF Photo Phono 0.552 0.595 0.603 0.642 0.52 0.529 0.54 0.592 0.564 0.603 0.598 0.647 Secondary Family SPF 0.595 0.578 0.544 0.494 28.3 Parallel Gatekeeping Design - Simulation for Discrete Outcomes <<< Contents * Index >>> 29 Two-Stage Multi-arm Designs using p-value combination 29.1 Introduction In the drug development process, identification of promising therapies and inference on selected treatments are usually performed in two or more stages. The procedure we will be discussing here is an adaptive two-stage design that can be used for the situation of multiple treatments to be compared with a control. This will allow integration of both the stages within a single confirmatory trial controlling the multiple level type-I error. After the interim analysis in the first stage, the trial may be terminated early or continued with a second stage, where the set of treatments may be reduced due to lack of efficacy or presence of safety problems with some of the treatments. This procedure in East is highly flexible with respect to stopping rules and selection criteria and also allows re-estimation of the sample size for the second stage. Simulations show that the method may be substantially more powerful than classical one-stage multiple treatment designs with the same total sample size because second stage sample size is focused on evaluating only the promising treatments identified in the first stage. This procedure is available for continuous as well discrete endpoint studies. The current chapter deals with the discrete endpoint studies only; continuous endpoint studies are handled similarly. 29.2 Study Design This section will explore different design options available in East with the help of an example. 29.2.1 Introduction to the Study 29.2.2 Methodology 29.2.3 Study Design Inputs 29.2.4 Simulating under Different Alternatives 29.2.1 Introduction to the Study A new chemical entity (NCE) is being developed for the treatment of reward deficiency syndrome, specifically alcohol dependence and binge eating disorder. Compared with other orally available treatments, NCE was designed to exhibit enhanced oral bioavailability, thereby providing improved efficacy for the treatment of alcohol dependence. Primary Objective: To evaluate the safety and efficacy of NCE compared with placebo when administered daily for 12 weeks to adults with alcohol dependence. Secondary Objective: To determine the optimal dose or doses of NCE. The primary endpoint is defined as the percent of subjects abstinent from heavy drinking during Weeks 5 through 12 of treatment based on self-report of drinking activity. A heavy drinking day is defined as 4 or more standard alcoholic drinks in 1 day for females and 5 or more standard alcoholic drinks in 1 day for males. The endpoint is based on the patient-reported number of standard alcoholic drinks per day, transformed into a binary outcome measure, abstinence from heavy drinking. 29.2 Study Design – 29.2.2 Methodology 621 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination 29.2.2 Methodology This is a multicenter, randomized, double-blind, placebo-controlled study conducted in two parts using a 2-stage adaptive design. In Stage 1, approximately 400 eligible subjects will be randomized equally among four treatment arms (NCE [doses: 1, 2.5, or 10 mg]) and matching placebo. After all subjects in Stage 1 have completed the 12-week treatment period or discontinued earlier, an interim analysis will be conducted to 1. compare the proportion of subjects in each dose group who have achieved abstinence from heavy drinking during Weeks 5 through 12, 2. to assess safety within each dose group and 3. drop the less efficient doses. Based on the interim analysis, Stage 2 of the study will either continue with additional subjects enrolling into 2 or 3 arms (placebo and 1 or 2 favorable, active doses) or the study will be halted completely if unacceptable toxicity has been observed. In this example, we will have the following workflow to cover different options available in East: 1. Start with four arms (3 doses + Placebo) 2. Evaluate the three doses at the interim analysis and based on the Treatment Selection Rules carry forward one or two of the doses to the next stage 3. While we select the doses, also increase the sample size of the trial by using Sample Size Re-estimation (SSR) tool to improve conditional power if necessary In a real trial, both the above actions (early stopping as well as sample size re-estimation) will be performed after observing the interim data. 4. See the final design output in terms of different powers, probabilities of selecting particular dose combinations 5. See the early stopping boundaries for efficacy and futility on adjusted p-value scale 6. Monitor the actual trial using the Interim Monitoring tool in East. Start East. Click Design tab, then click Many Samples in the Discrete category, and then click Multiple Looks- Combining p-values test. 622 29.2 Study Design – 29.2.2 Methodology <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will bring up the input window of the design with some default values. Enter the inputs as discussed below. 29.2.3 Study Design Inputs Let us assume that three doses of the treatment 1mg, 2.5mg, 10mg are compared with the Placebo arm. Preliminary sample size estimates are provided to achieve an overall study power of at least 80% at an overall, adequately adjusted 1-sided type-1 or alpha level of 2.5%, after taking into account all interim and final hypothesis tests. Note that we always use 1-sided alpha since dose-selection rules are usually 1-sided. In Stage 1, 400 subjects are initially planned for enrollment (4 arms with 100 subjects each). Following an interim analysis conducted after all subjects in Stage 1 have completed 12 weeks of treatment or discontinued earlier, an additional 200 subjects will be enrolled into 2 doses for Stage 2 (placebo and one active dose). So we start with the total of 400+200 = 600 subjects. The multiplicity adjustment methods available in East to compute the adjusted p-value 29.2 Study Design – 29.2.3 Study Design Inputs 623 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination (p-value corresponding to global NULL) are Bonferroni, Sidak, Simes. For discrete endpoint test, Dunnett Single Step is not available since we will be using Z-statistic. Let us use the Bonferroni method for this example. The p-values obtained from both the stages can be combined by using the “Inverse Normal” method. In the “Inverse Normal” method, East first computes the weights as follows: r n(1) (1) w = (29.1) n And r n(2) w(2) = (29.2) n where n(1) and n(2) are the total sample sizes corresponding to Stage 1 and stage 2 respectively and n is the total sample size. EAST displays these weights by default but these values are editable and user can specify any other weights as long as 2 2 w(1) + w(2) = 1 (29.3) p = 1 − Φ w(1) Φ−1 (1 − p(1) ) + w(2) Φ−1 (1 − p(2) ) (29.4) Final p-value is given by The weights specified on this tab will be used for p-value computation. w(1) will be used for data before interim look and w(2) will be used for data after interim look. Thus, according to the samples pfor the two stages in this example, the p sizes planned weights are calculated as (400/600) and (200/600). Note : These weights are updated by East once we specify the first look position as 400/600 in the Boundary tab. So leave these as default values for now. Set the Number of Arms as 4 and enter the rest of the inputs as shown below: We can certainly have early stopping boundaries for efficacy and/or futility. But generally, in designs like this, the objective is to select the best dose(s) and not stop 624 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 early. So for now, select the Boundary tab and set both the boundary families to “None”. Also, set the timing of the interim analysis as 0.667 which will be after observing the data on 400 subjects out of 600. Enter 400/600 as shown below. Notice the updated weights on the bf Test Parameters tab. The next tab is Response Generation which is used to specify the true underlying proportion of response on the individual dose groups and the initial allocation from which to generate the simulated data. Before we update the Treatment Selection tab, go to the Simulation Control Parameters tab where we can specify the number of simulations to run, the random number seed and also to save the intermediate simulation data. For now, enter the 29.2 Study Design – 29.2.3 Study Design Inputs 625 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination inputs as shown below and keep all other inputs as default. Click on the Treatment Selection tab. This tab is to select the scale to compute the treatment-wise effects. For selecting treatments for the second stage, the treatment effect scale will be required, but the control treatment will not be considered for selection. It will always be there in the second stage. The list under Treatment Effect Scale allows you to set the selection rules on different scales. Select Estimated δ from this list. It means that all the selection rules we specify on this tab will be in terms of the estimated value of treatment effect, δ, i.e., difference from placebo. Here is a list of all available treatment effect scales: Estimated Proportion, Estimated δ, Test Statistic, Conditional Power, Isotonic Proportion, Isotonic δ. For more details on these scales, refer to the Appendix K chapter on this method. The next step is to set the treatment selection rules for the second stage. Select Best r Treatments: The best treatment is defined as the treatment having the highest or lowest mean effect. The decision is based on the rejection region. If it is “Right-Tail” then the highest should be taken as best. If it is “Left-Tail” then the lowest is taken as best. Note that the rejection region does not affect the choice of treatment based on conditional power. Select treatments within of Best Treatment: Suppose the treatment effect scale is Estimated δ. If the best treatment has a treatment effect of δb and is specified 626 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as 0.1 then all the treatments which have a δ as δb − 0.1 or more are chosen for Stage 2. Select treatments greater than threshold ζ: The treatments which have the treatment effect scale greater or less than the threshold (ζ) specified by the user according to the rejection region. But if the treatment effect scale is chosen as the conditional power then it will be greater than all the time. Use R for Treatment Selection: If you wish to define any customized treatment selection rules, it can be done by writing an R function for those rules to be used within East. This is possible due to the R Integration feature in East. Refer to the appendix chapter on R Functions for more details on syntax and use of this feature. A template file for defining treatment selection rules is also available in the subfolder RSamples under your East installation directory. For more details on using R to define Treatment selection rules, refer to section O.10. For this example, select the first rule Select Best r treatments and set r = 1 which indicates that East will select the best dose for Stage 2 out the three doses. We will leave the default allocation ratio selections to yield equal allocation between the 29.2 Study Design – 29.2.3 Study Design Inputs 627 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination control and selected best dose in Stage 2. Click the Simulate button to run the simulations. When the simulations are over, a row gets added in the Output Preview area. Save this row to the Library by clicking the icon in the toolbar. Rename this scenario as Best1. Double click it to see the 628 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 detailed output. The first table in the detailed output shows the overall power including global power, conjunctive power, disjunctive power and FWER. The definitions for different powers are as follows: Global Power: probability of demonstrating statistical significance on one or more treatment groups Conjunctive Power: probability of demonstrating statistical significance on all treatment groups which are truly effective Disjunctive Power: probability of demonstrating statistical significance on at least one treatment group which is truly effective FWER: probability of incorrectly demonstrating statistical significance on at 29.2 Study Design – 29.2.3 Study Design Inputs 629 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination least one treatment group which is truly ineffective For our example, there is 0.8 global power, i.e., the probability of this design to reject any null hypothesis, where the set of null hypothesis are the TRUE proportion of responders at each dose equals that of control. Also shown are conjunctive and disjunctive power, as well as Family Wise Error Rate (FWER). The Lookwise Summary table summarizes the number of simulated trials that ended with a conclusion of efficacy, i.e., rejected any null hypothesis, at each look. In this example, no simulated trial stopped at the interim analysis with an efficacy conclusion since there were no stopping boundaries, but 8083 simulations yielded an efficacy conclusion via the selected dose after Stage 2. This is consistent with the global power. The next table Detailed Efficacy Outcomes for all 10000 Simulations, summarizes the number of simulations for which each dose was selected for Stage 2 and yielded an efficacy conclusion. For example, the dose 10mg was observed to be efficacious in 63% of simulated trials whereas none of the three doses were efficacious in 19% of trials. The last output table Marginal Probabilities of Selection and Efficacy, summarizes the number and percent of simulations in which each dose was selected for Stage 2, regardless of whether it was found significant at end of Stage 2 or not, as well as the number and percent of simulations in which each dose was selected and found significant. Average sample size is also shown. Note that since this design only selected the single best dose, this table gives almost the same information as the above one. Selecting multiple doses (arms) for Stage 2 would be of more effective than selecting just the best one. Click the button on the bottom left corner of the screen. This will take us back to the input window of the last simulation scenario. Go to Treatment Selection tab and set r = 2. It means that we are interested in carrying forward the two best doses out of the three. Run the simulations by keeping the sample size fixed as 600. The simulated power drops to approximately 73%. Note that the loss of power for this 2-best-doses-choice scenario in comparison to the previous example which chose only the best dose. This is because of the smaller sample sizes per dose in stage 2 for this 2-best-doses scenario since the sample size is split in Stage 2 among 2 doses and control instead of between only 1 dose and control in the best dose scenario. 630 29.2 Study Design – 29.2.3 Study Design Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now go to Test Parameters tab and change the sample size to 700 assuming that each of the two doses and Placebo will get 100 subjects in Stage 2. Accordingly, update the look position on Boundaries tab to 400/700 as well. Click the Simulate button to run the simulations. When the simulations are over, a row gets added in the Output icon in the toolbar. Preview area. Save this row to the Library by clicking the Rename this scenario as Best2. Double click it to see the detailed output. The interpretation of first two tables is same as described above. It restores the power to 80% and also gives us the design details when two of the three doses were selected. 29.2 Study Design – 29.2.3 Study Design Inputs 631 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination The table Detailed Efficacy Outcomes for all 10000 Simulations summarizes the number of simulations for which each individual dose group or pairs of doses were selected for Stage 2 and yielded an efficacy conclusion. For example, the pair (2.5mg, 10mg only) was observed to be efficacious in 41% of the trials (4076/10000). The next table Marginal Probabilities of Selection and Efficacy, summarizes the number and percent of simulations in which each dose was selected for Stage 2, regardless of whether it was found significant at end of Stage 2 or not, as well as the number and percent of simulations in which each dose was selected and found significant. Average sample size is also shown. It tells us how frequently the dose (either alone or with some other dose) was selected and efficacious. For example, dose 1mg was selected in approximately 25% trials and was efficacious in approximately 7% trials (which is the sum of 10, 130 and 555 simulations from previous table.) The advantage of 2-stage “treatment selection design” or “drop-the-loser” design is that it allows to drop the less performing/futile arms based on the interim data and still preserves the type-1 error as well as achieve the desired power. In the Best1 scenario, we dropped two doses (r = 1) and in the Best2 scenario, we dropped one dose (r = 2). Suppose, we had decided to proceed to stage 2 without dropping any doses. In this case, Power would have dropped significantly. To verify this in East, run the above scenario with r = 3 and save it to Library. Rename this scenario as All3. Double click it to see the detailed output. We can observe that the power drops from 80% to 72%. The three scenarios created so far can be compared in the tabular manner as well. Select the three nodes in the Library, click the 632 icon in the toolbar and select 29.2 Study Design – 29.2.4 Simulating under Different Alternatives <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 “Power” from the dropdown. A table as shown below will be created by East. 29.2.4 Simulating under Different Alternatives Since this is a simulation based design, we can perform sensitivity analyses by changing some of the inputs and observing effects on the overall power and other output. Let us first make sure that this design preserves the total type1 error. It can be done by running the simulations under “Null” hypothesis. Click the button on the bottom left corner of the screen. Go to Response Generation tab and enter the inputs as shown below: Also set r = 2 in the Treatment Selection tab. Run the simulations and go to the 29.2 Study Design – 29.2.4 Simulating under Different Alternatives 633 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination detailed output by saving the row from Output Preview to the Library. Notice the global power and simulated FWER is less than design type I error which means the overall type1 error is preserved. 29.3 Sample Size Reestimation As we have seen above, the desired power of 80% is achieved with the sample size of 700 if the initial assumptions (πc = 0.1, π1mg = 0.14, π2.5mg = 0.18, π10mg = 0.22) hold true. But if they do not, then the original sample size of 700 may be insufficient to achieve 80% power. The adaptive sample size re-estimation is suited to this purpose. In this approach we start out with a sample size of 700 subjects, but take an interim look after data are available on 400 subjects. The purpose of the interim look is not to stop the trial early but rather to examine the interim data and continue enrolling past the planned 700 subjects if the interim results are promising enough to warrant the additional investment of sample size. This strategy has the advantage that the sample size is finalized only after a thorough examination of data from the actual study rather than through making a large up-front sample size commitment before any data are available. Furthermore, if the sample size may only be increased but never decreased from the originally planned 700 subjects, there is no loss of efficiency due to overruns. Suppose the proportions of response on the four arms are as shown below. Update the Response Generation tab accordingly and also set the seed as 100 in the Simulation Controls tab. Run 10000 simulations and save the simulation row to the Library by clicking the 634 29.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon in the toolbar. Notice that the global power has dropped from 80% to 67%. Let us re-estimate the sample size to achieve the desired power. Add the Sample Size Re-estimation tab by clicking the button . A new tab is added as shown below. SSR At: For a K-look group sequential design, one can decide the time at which conditions for adaptations are to be checked and actual adaptation is to be carried out. This can be done either at some intermediate look or after some specified information fraction. The possible values of this parameter depend 29.3 Sample Size Re-estimation 635 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination upon the user choice. The default choice for this design is always the Look #. and is fixed to 1 since it is always a 2-look design. Target CP for Re-estimating Sample Size: The primary driver for increasing the sample size at the interim look is the desired (or target) conditional power or probability of obtaining a positive outcome at the end of the trial, given the data already observed. For this example we have set the conditional power at the end of the trial to be 80%. East then computes the sample size that would be required to achieve this desired conditional power. Maximum Sample Size if Adapt (multiplier; total): As just stated, a new sample size is computed at the interim analysis on the basis of the observed data so as to achieve some target conditional power. However the sample size so obtained will be overruled unless it falls between pre-specified minimum and maximum values. For this example, the range of allowable sample sizes is [700, 1400]. If the newly computed sample size falls outside this range, it will be reset to the appropriate boundary of the range. For example, if the sample size needed to achieve the desired 80% conditional power is less than 700, the new sample size will be reset to 700. In other words we will not decrease the sample size from what was specified initially. On the other hand, the upper bound of 1400 subjects demonstrates that the sponsor is prepared to increase the sample size up to double the initial investment in order to achieve the desired 80% conditional power. But if 80% conditional power requires more than 1400 subjects, the sample size will be reset to 1400, the maximum allowed. Promising Zone Scale: One can define the promising zone as an interval based on conditional power, test statistic, or estimated δ. The input fields change according to this choice. The decision of altering the sample size is taken based on whether the interim value of conditional power / test statistic / δ lies in this interval or not. Let us keep the default scale which is Conditional Power. Promising Zone: Minimum/Maximum Conditional Power (CP): The sample size will only be altered if the estimate of CP at the interim analysis lies in a pre-specified range, referred to as the “Promising Zone”. Here the promising zone is 0.30 − 0.80. The idea is to invest in the trial in stages. Prior to the interim analysis the sponsor is only committed to a sample size of 700 subjects. If, however, the results at the interim analysis appear reasonably promising, the sponsor would be willing to make a larger investment in the trial and thereby improve the chances of success. Here we have somewhat arbitrarily set the lower bound for a promising interim outcome to be CP = 0.30. An estimate 636 29.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 CP < 0.30 at the interim analysis is not considered promising enough to warrant a sample size increase. It might sometimes be desirable to also specify an upper bound beyond which no sample size change will be made. Here we have set that upper bound of the promising zone at CP = 0.80. In effect we have partitioned the range of possible values for conditional power at the interim analysis into three zones; unfavorable (CP < 0.3), promising (0.3 ≤ CP < 0.8), and favorable (CP ≥ 0.8). Sample size adaptations are made only if the interim CP falls in the promising zone at the interim analysis. The promising zone defined on the Test Statistic scale or the Estimated δ scale works similarly. SSR Function in Promising Zone: The behavior in the promising zone can either be defined by a continuous function or a step function. The default is continuous where East accepts the two quantities - (Multiplier, Target CP) and re-estimates the sample size depending upon the interim value of CP/test statistic/effect size. The SSR function can be defined as a step-function as well. This can be done with a single piece or with multiple pieces. For each piece, define the step function in terms of: the interval of CP/test statistic/effect size. This depends upon the choice of promising zone scale. the value of re-estimated sample size in that interval. for single piece, just the total re-estimated sample size is required as an input. If the interim value of CP/ test statistic/effect size lies in the promising zone then the re-estimation will be done using this step function. Let us set the inputs on Sample Size Re-estimation tab as shown below: 29.3 Sample Size Re-estimation 637 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination Run 10000 simulations and see the Details. Just for the comparison purpose, re-run the simulations but this time, set the multiplier in the Sample Size Re-estimation tab to 1 which means we are not interested in sample size re-estimation. Both the scenarios can also be run by entering two values 1, 2 in the cell for Multiplier. 638 29.3 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With Sample Size Re-estimation Without Sample Size Re-estimation We observe from the table the power of adaptive implementation is approximately 75% which is almost 8% improvement over the non-adaptive design. This increase in power has come at an average cost of 805-700 = 105 additional subjects. Next we observe from the Zone-wise Averages table that 1563 of 10000 trials (16%) underwent sample size re-estimation and of those 1563 trials, 84% were able to reject the Global null hypothesis. The average sample size, conditional on adaptation is 1376. 29.4 Adding Early Stopping Boundaries One can also incorporate stopping boundaries to stop at the interim early for efficacy or futility. The efficacy boundary can be defined based on Adjusted p-value scale whereas futility boundary can be on Adjusted p-value or δ scale. Click the button on the bottom left corner of the screen. This will take you back to the input window of the last simulation scenario. Go to Boundary tab and set Efficacy and Futility boundaries to “Adjusted p-value”. These boundaries are for 29.4 Adding Early Stopping Boundaries 639 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination early stopping at look1. As the note on this tab says: If any one adjusted p-value is ≤ efficacy p-value boundary then stop the trial for efficacy If only all the adjusted p-values are > futility p-value then stop the trial for futility. Else carry forward all the treatments to the next step of treatment selection. Stopping early for efficacy or futility is step which is carried out before treatment selection rules are applied. The simulation output has the same explanation as above except the Lookwise Summary table may have some trials stopped at the first look due to efficacy or futility. 29.5 Monitoring this trial Select the simulation node with SSR implementation and click the invoke the Interim Monitoring dashboard. Click the open the Test Statistic Calculator. Enter the data as shown below: icon. It will icon to Click Recalc to calculate the test statistic as well as the raw p-values. Notice that the p-value for 1mg is 0.095 which is greater than 0.025. We will drop this dose in the second stage. On clicking OK, it updates the dashboard. 640 29.5 Monitoring this trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Open the test statistic calculator for the second look and enter the following information and also drop the dose 1mg. Click Recalc to calculate the test statistic as 29.5 Monitoring this trial 641 <<< Contents 29 * Index >>> Two-Stage Multi-arm Designs using p-value combination well as the raw p-values. On clicking OK, it updates the dashboard. Observe that the adjusted p-value for 10mg crosses the efficacy boundary. It can also be observed in the Stopping Boundaries chart. 642 29.5 Monitoring this trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 29.5 Monitoring this trial 643 <<< Contents * Index >>> 30 30.1 Logistic Regression with Single Normal Covariate Binomial Superiority Regression Logistic regression is widely used for modeling the probability of a binary response in the presence of covariates. In this section we will show how East may be used to design clinical trials with binomial endpoints, while adjusting for the effects of covariates through the logistic regression model. The sample size calculations for the logistic regression models discussed here and implemented in East are based on the methods of Hsieh et al., 1997. We note, however, that these methods are limited to continuous covariates only. When the covariate is normal, the log odds value β1 is zero if and only if the group means between the two response categories are the same assuming equal variances. Suppose in a logistic regression model, Y is a binary response variable and X1 is a covariate related to Y . The model is given by log( P ) = β0 + β1 X1 1−P (30.1) where P = P (Y = 1). The null hypothesis that the coefficient of the covariate β1 is zero is tested against the two sided alternative hypothesis that β1 is not equal to zero. The slope coefficient β1 is the change in log odds for every one unit increase in X1 . The sample size required for a two sided test with type-I error rate of α to have a power 1 − β is n= (Z1− α2 + Z1−β )2 P1 (1 − P1 )β ∗2 (30.2) Where β ∗ is the effect size to be tested, P1 is the event rate at the mean of X and Zu is the upper u-th percentile of the standard normal distribution. 30.1.1 Trial Design We use a Department of Veterans Affairs Cooperative Study entitled ’A Psychophysiological Study of Chronic Post-Traumatic Stress Disorder’ to illustrate the preceding sample size calculation for logistic regression with continuous covariates. The study developed and validated a logistic regression model to explore the use of certain psychophysiological measurements for the prognosis of combat-related 644 30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 post-traumatic stress disorder (PTSD). In the study, patients’ four psychophysiological measurements-heart rate, blood pressures, EMG and skin conductance- were recorded while patients were exposed to video tapes containing combat and neutral scenes. Among the psychophysiological variables, the difference of the heart rates obtained while viewing the combat and the neutral tapes (DCNHR) is considered a good predictor of the diagnosis of PTSD. The prevalence rate of PTSD among the Vietnam veterans was assumed to be 20 per cent. Therefore, we assumed a four to one sample size ratio for the non-PTSD versus PTSD groups. The effect size of DCNHR is approximately 0.3 which is the difference of the group means divided by the standard deviation. We would like to determine the sample size to achieve 90% power based on a two-sided test at significance level 0.05 (Hsieh et. al.,1998). Start East. Click Design tab, then click Regression in the Discrete group, and then clickLogistic Regression - Odds Ratio. The input dialog box, with default input values will appear in the upper pane of this window. Enter 0.2 in Proportion Success at X = µ, (P0 ) and 1.349 in Odds Ratio P1 (1 − P0 )/P0 (1 − P1 ) field. Enter the rest of the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed sample size highlighted in yellow. A total of 733 subjects must be enrolled in order to achieve 90% power under the alternative hypothesis. Besides sample size, one can also compute the power and the level of significance for this design. You can select this design by clicking anywhere on the row in the Output Preview. If 30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design 645 <<< Contents 30 * Index >>> Binomial Superiority Regression you click on icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click the icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click below: 646 icon to see the detailed output as shown 30.1 Logistic Regression with Single Normal Covariate – 30.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Observe that this kind output gives us the summary of the output as well. With Des1 selected in the Library, click icon, on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 30.1 Logistic Regression with Single Normal Covariate 647 <<< Contents 30 648 * Index >>> Binomial Superiority Regression 30.1 Logistic Regression with Single Normal Covariate <<< Contents * Index >>> 31 31.1 Cohen’s Kappa 31.1.1 Trial Design Agreement In some experimental situations, to check inter-rater reliability, independent sets of measurements are taken by more than one rater and the responses are checked for agreement. For a binary response, Cohen’s Kappa test for binary ratings can be used to check inter-rater reliability. Conventionally, the kappa coefficient is used to express the degree of agreement between two raters when the same two raters rate each of a sample of n subjects independently, with the ratings being on a categorical scale consisting of k categories (Fleiss, 1981). A simple example is given in the below table where two tests Test 1 and Test 2 (k = 2) were performed. In the below table, πij denotes the true population proportion in the i-th row and the j-th column category. Table 31.1: Table of proportions of two raters Test 1\ Test 2 Test 1(+) Test 1(-) Marginal Probability Test 2(+) π11 π21 π.1 Test 2(-) π12 π22 π.2 Marginal Probability π1. π2. 1 The Kappa coefficient (κ) is defined by κ= where π0 = P2 i=1 πii and πe = P2 i=1 π0 − πe 1 − πe (31.1) πi. π.i . We want to test the null hypothesis H0 : κ ≤ κ0 against H1 : κ > κ0 where κ0 > 0. The total sample size required for a test with type-I error rate of α to have a power 1 − β is n= 31.1 Cohen’s Kappa (zα + zβ )2 (E + F − G) [(1 − πe )2 (κ − κ0 )]2 (31.2) 649 <<< Contents 31 * Index >>> Agreement where E= 2 X πii [(1 − πe ) − (π.i + πi. )(1 − π0 )]2 (31.3) i=1 F = (1 − π0 ) 2 2 X X πij (π.i + πj. )2 (31.4) i=1 j6=i and G = [π0 (1 + πe ) − 2πe ]2 (31.5) We can calculate power, sample size or level of significance for your Cohen’s Kappa test for two ratings. 31.1.1 Trial Design Consider responses from two raters. The example is based on a study to develop and validate a set of clinical criteria to identify patients with minor head injury who do not undergo a CT scan (Haydel, et al., 2000). In the study, CT scan was first reviewed by a staff neuroradiologist. An independent staff radiologist then reviewed 50 randomly selected CT scans and the two sets of responses checked for agreement. Let κ denote the level of agreement. The null hypothesis is H0 : κ = 0.9 versus the one-sided alternative hypothesis H1 : κ < 0.9. We wish to compute the power of the test at the alternative value κ1 = 0.6. We expect each rater to identify 8% of CT scans to be positive. Also we expect 5% of the positive CT scans were rated by both the raters. Start East. Click Design tab, then click Agreement in the Discrete group, and then clickCohen’s Kappa (Two Binary Ratings . 650 31.1 Cohen’s Kappa – 31.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The input dialog box, with default input values will appear in the upper pane of this window. Enter 0.9 in Null Agreement (κ0 ) field. Specify the α = 0.05, sample size and the kappa parameter values as shown below. Enter the rest of the inputs as shown below and click Compute. The design output will be displayed in the Output Preview, with the computed power highlighted in yellow. The power of the test is 64.9% given a sample size of 50 scans to establish agreement of ratings by the two radiologists. Besides power, one can also compute the sample size for this study design. You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In 31.1 Cohen’s Kappa – 31.1.1 Trial Design 651 <<< Contents 31 * Index >>> Agreement the Output Preview toolbar, click icon, to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing 652 31.1 Cohen’s Kappa – 31.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 31.2 Cohen’s Kappa (C Ratings) Let κ denotes the measure of agreement between two raters who each classify n objects into C mutually exclusive ratings (categories). Here the null hypothesis is H0 : κ = κ0 is tested against two-sided hypothesis H1 : κ 6= κ0 or one sided hypothesis H1 : κ > κ0 or H1 : κ < κ0 . The total sample size required for a test with type-I error rate of α to have a power 1 − β when κ = κ1 is n≥[ Z1−α max τ (κ̂|κ = κ0 ) + Z1−β max τ (κ̂|κ = κ1 ) ] κ1 − κ0 (31.6) Where 1 τ (κ̂) = (Q1 + Q2 − 2Q3 − Q4 ) 2 (1 − πe )2 (31.7) and 31.2 Cohen’s Kappa (C Ratings) 653 <<< Contents 31 * Index >>> Agreement Q1 = π0 (1 − πe )2 , PC PC Q2 = (1 − π0 )2 i=1 j=1 πij (πi. + π.j )2 , PC Q3 = 2(1 − π0 )(1 − πe ) i=1 πij (πi. + π.j ), Q4 = (π0 πe − 2πe + π0 )2 . πij is the proportion of subjects that Rater 1 places in category i but Rater 2 places in category j, π0 is the proportion of agreement and πe is the expected proportion of agreement. The power of the test is given by √ Power = Φ[ 31.2.1 n(κ1 − κ0 ) − Z1−α max τ (κ̂|κ = κ0 ) ] max τ (κ̂|κ = κ1 ) (31.8) Trial Design Consider a hypothetical problem of physical health ratings from two different raters-health instructor and subject’s general practitioner. 360 subjects were randomly selected and the two sets of responses were checked for agreement. Let κ denote the level of agreement. The null hypothesis is H0 : κ = 0.6 versus the one-sided alternative hypothesis H1 : κ < 0.6. We wish to compute the power of the test at the alternative value κ1 = 0.5. Table 31.2: Table: Contingency Table General Petitioner \ Health Instructor Poor Fair Good Excellent Total Poor 2 9 4 1 16 Fair 12 35 36 8 91 Good 8 43 103 30 184 Excellent 0 7 40 22 69 Total 22 94 183 61 360 Start East. Click Design tab, then click Agreement in the Discrete group, and then clickCohen’s Kappa (Two Categorical Ratings . 654 31.2 Cohen’s Kappa (C Ratings) – 31.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The input dialog box, with default input values will appear in the upper pane of this window. Enter Number of Ratings (C) as 4. Enter 0.6 in Null Agreement (κ0 ) field and 0.5 in Alternative Agreement (κ1 ) . Click Marginal Probabilities and specify the marginal probabilities calculated from the above table. Specify the sample size. Leave all other values as defaults, and click Compute. The design output will be displayed in the Output Preview, with the computed power highlighted in yellow. The power of the test is 73.3% given a sample size of 360 subjects to establish agreement of ratings by the two raters. Besides power, one can also compute the sample size for this study design. 31.2 Cohen’s Kappa (C Ratings) – 31.2.1 Trial Design 655 <<< Contents 31 * Index >>> Agreement You can select this design by clicking anywhere on the row in the Output Preview. If you click icon, some of the design details will be displayed in the upper pane. In the Output Preview toolbar, click icon to save this design to workbook Wbk1 in the Library. If you hover the cursor over Des1 in the Library, a tooltip will appear that summarizes the input parameters of the design. With Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The resulting power curve for this design is shown. You can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As.... For now, you may close the chart before continuing. 656 31.2 Cohen’s Kappa (C Ratings) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 31.2 Cohen’s Kappa (C Ratings) 657 <<< Contents * Index >>> 32 Dose Escalation This chapter deals with the design, simulation, and interim monitoring of Phase 1 dose escalation trials. A brief overview of the designs is given below; more technical details are available in the Appendix N. One of the primary goals of Phase I trials in oncology is to find the maximum tolerated dose (MTD). Currently, the vast majority of such trials have employed traditional dose escalation methods such as the 3+3 design. The 3+3 design starts by allocating three patients typically to the lowest dose level, and then adaptively moves up and down in subsequent cohorts until either the MTD is obtained, or the trial is stopped for excessive toxicity. In addition to the 3+3, East also provides the Continual Reassessment Method (CRM), the modified Toxicity Probability Interval (mTPI) method, and the Bayesian logistic regression model (BLRM) for single agent designs. Compared to the 3+3, these modern methods may offer a number of advantages, which can be explored systematically via simulation and interim monitoring. The CRM (Goodman et al., 1995; O’Quigley et al., 1990) is a Bayesian model-based method that uses all available information from all doses to guide dose assignment. One first specifies a target toxicity, a one-parameter dose response curve and corresponding prior distribution. The posterior mean and predictions for the probability of toxicity at each dose are updated as the trial progresses. The next recommended dose is the one whose toxicity probability is closest to the target toxicity. The mTPI method (Ji et al., 2010) is Bayesian like the CRM, but rule-based like the 3+3. In this way, the mTPI represents a useful compromise between the other methods. An independent beta distribution is assumed for the probability of toxicity at each dose. A set of decision intervals are specified, and subsequent dosing decisions (up, down, or stay) are determined by computing the normalized posterior probability in each interval at the current dose. The normalized probability for each interval is known as the unit probability mass (UPM). A more advanced version of the CRM is the BLRM (Neuenschwander et al., 2008; Sweeting et al., 2013), which assumes a two-parameter logistic dose response curve. In addition to a target toxicity, one specifies a set of decision intervals, and optional associated losses, for guiding dosing decisions. For dual-agent combination designs, East provides a combination version of the BLRM (Neuenschwander et al., 2014), as well as the PIPE (product of independent beta probabilities escalation) method (Mander & Sweeting, 2015). 658 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 32.1 3+3 32.1.1 Simulation 32.1.2 Interim Monitoring 32.1.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent Design: 3+3. This window is the Input dialog box, which is separated into three tabs: Design Parameters, Response Generation, and Simulation Control. First, you may specify the Max. Number of Doses as 7. In the Design Parameters tab, enter 30 as the Max. Sample Size. For the 3+3 design, the Cohort Size is fixed at 3. There are three variants of 3+3 offered: L and H and L(modified). The key differences between these variants can be seen in the respective Decision Rules table. Select 3+3 32.1 3+3 – 32.1.1 Simulation 659 <<< Contents 32 * Index >>> Dose Escalation L. You also have the option of starting with an Accelerated Titration design (Simon et al., 1997), which escalates with single-patient cohorts until the first DLT is observed, after which the cohort is expanded at the current dose level with two more patients. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). In the row titled Dose, you can specify the dose levels (e.g., in mg). In the row titled GC1, you can edit the true probabilities of toxicity at each dose. You can also rename the profile by directly editing that cell. For now, leave all entries at their default values. You can add a new profile generated from a parametric curve family. For example, click on the menu Curve Family and select Emax. You may construct a 660 32.1 3+3 – 32.1.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 four-parameter Emax curve by adjusting its parameters, then click Add Profile. Click Plot Profiles to plot the two dose toxicity curves in this grid. In the Simulation Control tab, check the boxes corresponding to Save summary statistics and Save subject-level data. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots for more than one simulation, you can increase the number. For now, leave this at 1 to save computation time. 32.1 3+3 – 32.1.1 Simulation 661 <<< Contents 32 * Index >>> Dose Escalation You may also like to examine the Local Options button of the input window toolbar. This gives you different options for computing average allocations for each dose. Click Simulate. East will simulate data generated from the two profiles you specified, and apply the 3+3 design to each simulation data set. Once completed, the two simulations will appear as two rows in the Output Preview. Select both rows in the Output Preview and click the icon in the toolbar. The two simulations will be displayed side by side in the Output Summary. In the Output Preview toolbar, click the 662 32.1 3+3 – 32.1.1 Simulation icon to save both simulations to the <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library. Double-click Sim1 in the Library to display the simulation output details. With Sim1 selected in the Library, click the Plots icon to access a wide range of available plots. Below is an example of the MTD plot, showing the percentage of simulations that each dose level was selected as the MTD. The ”true” MTD is displayed as the second dose level. This is the dose whose true probability of DLT 32.1 3+3 – 32.1.1 Simulation 663 <<< Contents 32 * Index >>> Dose Escalation (0.1) was closest to and below the target probability (1/6). Another useful plot is that showing the possible sample sizes, shown as percentages over all simulations. Close each plot after viewing, or save them by clicking Save in Workbook. Finally, to save the workbook to disk, right-click Wbk1 in the Library and then Save 664 32.1 3+3 – 32.1.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 As.... 32.1.2 Interim Monitoring Right-click one of the Simulation nodes in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. 32.1 3+3 – 32.1.2 Interim Monitoring 665 <<< Contents 32 * Index >>> Dose Escalation The dashboard will be updated accordingly, and the next Recommended Dose is 10. Click Enter Interim Data again, with 10 selected as Dose Assigned, enter 2 for DLTs Observed, and click OK. . 666 32.1 3+3 – 32.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 East now recommends de-escalation to 5. Click Enter Interim Data, with 5 selected as Dose Assigned, enter 1 for DLTs Observed, and click OK. East recommends that you stop the trial. . Click Stop Trial to generate a table for final inference. . 32.1 3+3 – 32.1.2 Interim Monitoring 667 <<< Contents 32 32.2 * Index >>> Dose Escalation Continual Reassessment Method (CRM) 32.2.1 Simulation 32.2.2 Interim Monitoring 32.2.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent Design: Continual Reassessment Method. This window is the Input dialog box, which is separated into four tabs: Design Parameters, Stopping Rules, Response Generation, and Simulation Control. In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for Cohort Size. If you were to check the box Start With, then you would be simulating from the 3+3 or Accelerated Titration design first, before switching to the CRM. For this tutorial, however, leave the box unchecked. Enter 0.25 for the Target Probability of Toxicity, and 0.3 for the Target Probability Upper Limit. This will ensure that the next dose assignment is that whose posterior mean toxicity probability is closest to 0.25, and below 0.3. Click the Posterior Sampling... button. By default, CRM requires the posterior mean only. If instead you wish to sample from the posterior distribution (using a 668 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Metropolis-Hastings algorithm), you will be able to compute and plot the posterior probabilities of being the MTD for each dose. Note that this option will increase the simulation time. Click the Dose Skipping... button. As was recommended in later variations of CRM, in the interests of promoting safety, leave the default options: No untried doses will be skipped while escalating, and no dose escalation will occur when the most recent subject experienced a DLT. For Model Type, select Power, with a Gamma(α = 1,β = 1) prior for θ. Other model types available include the Logistic and the Hyperbolic Tangent. Finally, for the prior probabilities of toxicity of all doses (known as the skeleton), enter: 0.05, 0.1, 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation 669 <<< Contents 32 * Index >>> Dose Escalation 0.2, 0.3, 0.35, 0.4, and 0.45. Click the icon to generate a chart of the 95% prior intervals at each dose for probability of DLT. In the Stopping Rules tab, you may specify various rules for stopping the trial. Enter 670 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the following inputs as below. The early stopping rules are divided into two types: Those where the MTD is not determined, and those where the MTD is determined. The former case may arise when the MTD is estimated to be below the lowest dose or above the highest dose. Thus, if the posterior probability of overdosing (toxicity at the lowest dose is greater than target toxicity) exceeds 0.8, then the trial will be stopped. Similarly, if the posterior probability of underdosing (toxicity at the highest dose is lower than target toxicity) exceeds 0.9, then the trial will be stopped. A minimum of 6 subjects will need to be observed on a dose before either of these two rules is activated. A further stopping rule is based on the Allocation Rule: If the number of subjects already allocated to the current MTD is at least 9, the trial will be stopped. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the default profile as shown below. If you wish to edit or add additional profiles (dose 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation 671 <<< Contents 32 * Index >>> Dose Escalation response curves), see the corresponding section for the 3+3 design. In the Simulation Control tab, check the boxes corresponding to Save summary statistics and Save subject-level data. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots for more than one simulation, you can increase the number. For now, leave this at 1 to save computation time. Click Simulate to simulate the CRM design. In the Output Preview toolbar, click the icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. Click the Plots icon in the Library to access a wide range of available plots. Below is an example of the MTD plot, showing the percentage of simulations that each dose level was selected as the MTD. The true MTD is displayed as the third dose level (15). This is the dose whose true probability of DLT (0.2) was closest to and below the 672 32.2 Continual Reassessment Method (CRM) – 32.2.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 target probability (0.25). 32.2.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. Continue in this manner by clicking Enter Interim Data, entering the following doses, and the corresponding number of DLTs. 32.2 Continual Reassessment Method (CRM) – 32.2.2 Interim Monitoring 673 <<< Contents 32 * Index >>> Dose Escalation If you click Display by Dose, you will see the data grouped by dose level. You may click Display by Cohort to return to the original view. After each cohort, East will update the Interim Monitoring Dashboard. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. At this point, East recommends that you stop the trial. . 674 32.2 Continual Reassessment Method (CRM) – 32.2.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Stop Trial to generate a table for final inference. . 32.3 modified Toxicity Probability Interval (mTPI) 32.3.1 Simulation 32.3.2 Interim Monitoring 32.3.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent Design: Modified Toxicity Probability Interval. This window is the Input dialog box, which is separated into five tabs: Design Parameters, Stopping Rules, Trial Monitoring Table, Response Generation, and Simulation Control. In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for Cohort Size. If you were to check the box Start With, then you would be simulating from the 3+3 or Accelerated Titration design first, before switching to the mTPI. For this tutorial, however, leave the box unchecked. Enter 0.25 for the Target Probability of Toxicity, 0.2 for the upper limit of the Under 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation 675 <<< Contents 32 * Index >>> Dose Escalation dosing interval, and 0.3 for the upper limit of Proper dosing interval. These entries imply that toxicity probabilities within this interval [0.2 to 0.3] can be regarded as equivalent to the target toxicity (0.25) as far as dosing decisions are concerned. Finally, we will assume a uniform Beta(a = 1, b = 1) prior distribution for all doses. In the Stopping Rules tab, enter the following inputs as below. For the mTPI design, the stopping rule is based on dose exclusion rules. This states that if there is greater than a 0.95 posterior probability that toxicity for a given dose is greater than the target toxicity, that dose and all higher doses will be excluded in subsequent cohorts. When this dose exclusion rule applies to the lowest dose, then all doses are excluded, and hence the trial will be stopped for excessive toxicity. 676 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Furthermore, the dose exclusion rule is not activated until at least 3 subjects are observed on a dose. A similar idea can be applied to the highest dose: If there is a greater than 95% posterior probability that the toxicity at the highest dose is less than the target toxicity, then stop the trial early. The remaining stopping rules allow one to stop the trial early with MTD determined. The Allocation Rule requires a certain number of subjects already allocated to the next recommended dose. The CI Rule requires that the credible interval for probability of DLT at the MTD is within some range. The Target Rule requires that the posterior probability of being in the target toxicity, or proper dosing interval, exceeds some threshold. Finally, any of these rules can be combined with Minimum Ss Observed in the Trial. In the Trial Monitoring Table tab, you can view the decision table corresponding to the inputs entered in the previous tabs. East also provides the option of creating and simulating from a customized trial monitoring table. If you click Edit Trial Monitoring Table, you can click on any cell 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation 677 <<< Contents 32 * Index >>> Dose Escalation in the grid to edit and change the dose assignment rule for that cell. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the default profile as shown below. If you wish to edit or add additional profiles (dose response curves), see the corresponding section for the 3+3 design. In the Simulation Control tab, check the boxes corresponding to Save summary statistics and Save subject-level data. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots for more than one simulation, you can increase the number. For now, leave this at 1 to save computation time. Click the Local Options button at the top left corner of the input window toolbar. This gives you different options for computing average allocations for each dose, and for computing isotonic estimates. Select the following options and click OK. Click Simulate to simulate the mTPI design. In the Output Preview toolbar, click the 678 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. For example, the true MTD was D3 (15), and this dose was selected as MTD the most often (43% of the time). Click the Plots icon in the Library to access a wide range of available plots. 32.3.2 Interim Monitoring Right-click one of the Simulation nodes in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. In the interim monitoring toolbar, click the chart icon, and Trial Monitoring Table to generate a table to guide dosing decisions for this trial. Click Enter Interim Data to open a window in which to enter data for the first cohort: 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.2 Interim Monitoring 679 <<< Contents 32 * Index >>> Dose Escalation in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. The dashboard will be updated accordingly. The decision for the next cohort is based on the highest Unit Probability Mass (UPM): the posterior probability for each toxicity interval divided by the length of the interval. The underdosing interval corresponds to an E (Escalate) decision, the proper dosing interval corresponds to an S (Stay) decision, and the overdosing interval corresponds to a D (De-escalate) decision. In this case, the UMP for underdosing is highest. Thus, the recommendation is to escalate to dose 10. Continue in this manner by entering data for each subsequent cohort, and observe how the interim monitoring dashboard updates. 680 32.3 modified Toxicity Probability Interval (mTPI) – 32.3.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One example is given below. After each cohort, East will update the Interim Monitoring Dashboard. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. Suppose we wished to end the study after 8 cohorts (24 patients). Click Stop Trial to end the study and generate a table of final inference. 32.4 Bayesian logistic regression model (BLRM) 32.4.1 Simulation 32.4.2 Interim Monitoring 32.4.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Single Agent 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation 681 <<< Contents 32 * Index >>> Dose Escalation Design: Bayesian Logistic Regression Model. This window is the Input dialog box, which is separated into four tabs: Design Parameters, Stopping Rules, Response Generation, and Simulation Control. In the Design Parameters tab, enter 30 as the Maximum Sample Size, and 3 for Cohort Size. If you were to check the box Start With, then you would be simulating from the 3+3 or Accelerated Titration design first, before switching to the BLRM. For this tutorial, however, leave the box unchecked. The next step is to choose a Dose Selection Method: either by Bayes Risk or by Max Target Toxicity. For the next cohort, the Bayes risk method selects the dose that minimizes the posterior expected loss, aka Bayes risk. In contrast, Max Target Toxicity method selects the dose that maximizes the posterior probability of targeted toxicity. For both methods, the dose selected must not exceed the EWOC (Escalation With Overdose Control) threshold: the posterior probability of overdosing, 682 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 either excessive or unacceptable toxicity, is less than the threshold (e.g., 0.25). Recall that the BLRM method applies the following model: logit(πd ) = log(α) + β log(d/d∗ ) (32.1) The Reference Dose (D*) is the dose at which the odds of observing a DLT is α. Click the Dose Skipping button, and select Allow skipping any doses / No Restrictions. You can specify the prior directly in terms of a bivariate normal distribution for log(α) 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation 683 <<< Contents 32 * Index >>> Dose Escalation and log(β). Alternatively, if you click Prior Calculator, a calculator will appear allowing you to specify a prior indirectly by one of three methods: (1) lowest dose and reference dose, (2) lowest dose and highest dose, or (3) lowest dose and MTD. Click Recalc to convert the prior inputs into matching bivariate normal parameter values, and click OK to paste these values into the input window. Appendix N of the manual, and Appendix A of Neuenschwander et al. (2008) describes some of these methods. Click the 684 icon to generate a chart of the 95% prior intervals at each dose for 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probability of DLT. Click Posterior Sampling Methods to select from one of two methods: Metropolis Hastings, or direct Monte Carlo. For this tutorial, click OK to select Direct. In the Stopping Rules tab, you can specify multiple rules for stopping the trial. The trial is stopped early and MTD not determined if there is evidence of underdosing. This rule is identical to that from mTPI: If there is a greater than some threshold posterior probability that the toxicity at the highest dose is less than the target toxicity, then stop the trial early. The remaining stopping rules allow one to stop the trial early with MTD determined. The Allocation Rule requires a certain number of subjects already allocated to the 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation 685 <<< Contents 32 * Index >>> Dose Escalation next recommended dose. The CI Rule requires that the credible interval for probability of DLT at the MTD is within some range. The Target Rule requires that the posterior probability of being in the target toxicity exceeds some threshold. Finally, any of these rules can be combined with Minimum Ss Observed in the Trial. Check the appropriate boxes and enter values as below. In the Response Generation tab, you can specify a set of true dose response curves from which to simulate. For the Starting Dose, select the lowest dose (5). Leave the default profile as shown below. If you wish to edit or add additional profiles (dose response curves), see the corresponding section for the 3+3 design. In the Simulation Control tab, check the boxes corresponding to Save summary statistics, Save subject-level data, and Save final posterior samples. These options will provides access to several charts derived from these more detailed levels of simulated data. If you wish to display subject-level plots, or posterior distribution plots, for more than one simulation, you can increase the number. For now, leave both of these at 1 to save computation time. Click Simulate to simulate the BLRM design. In the Output Preview toolbar, click 686 32.4 Bayesian logistic regression model (BLRM) – 32.4.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. Click the Plots icon in the Library to access a wide range of available plots. 32.4.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned and the DLTs Observed. Click OK to continue. The dashboard will be updated accordingly. The acceptable dose range is on a continuous scale between the minimum and maximum doses. The upper limit of the acceptable dose range is the largest dose whose probability of overdosing is less than the EWOC threshold. The lower limit of the acceptable range is the dose whose DLT rate is equal to the lower limit of the targeted toxicity interval. When the computed lower limit exceeds the recommended 32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring 687 <<< Contents 32 * Index >>> Dose Escalation dose, it is set to the recommended dose. In the IM toolbar, click the Plots icon, then Interval Probabilities by Dose and Panel. Notice that for all doses greater than or equal to 25, the posterior probability of overdosing exceeds the EWOC threshold (0.25). Of the remaining doses, dose 15 maximizes the probability of targeted toxicity, and is therefore the next recommended 688 32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 dose. In the IM toolbar, click the Plots icon, then Predictive Distribution of Number of DLTs. You can enter a planned cohort size and select a next dose, to plot the posterior predictive probability of the number of DLTs to be observed in next cohort. 32.4 Bayesian logistic regression model (BLRM) – 32.4.2 Interim Monitoring 689 <<< Contents 32 * Index >>> Dose Escalation After each cohort, East will update the Interim Monitoring Dashboard. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. Continue entering data for each subsequent cohort, and observe how the interim monitoring dashboard updates. One example is given below. Click Stop Trial to generate final inference table. 32.5 Bayesian logistic regression model for dual-combination (comb2BLRM) 32.5.1 Simulation 32.5.2 Interim Monitoring 690 32.5.1 Simulation Click Discrete: Dose Escalation on the Design tab, and then click Two Agents 32.5 BLRM Dual Combination – 32.5.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Design: Bayesian Logistic Regression Model for Dual-Combination. Set the Max. Number of Doses as 4 for both Agent 1 and Agent 2, the Max. Sample Size as 60, the Cohort Size as 3. Set the target toxicity interval to 16-35%, with an EWOC criterion of 0.25. Set the reference doses to 290 and 20 for Agents 1 and 2, respectively. Click the button for Dose Skipping. These options imply that the dose of only one compound can be increased for the next cohort (no diagonal escalation), with a 32.5 BLRM Dual Combination – 32.5.1 Simulation 691 <<< Contents 32 * Index >>> Dose Escalation maximum increment of 100 The prior distribution is an extension of that for the single-agent BLRM, but includes a normal prior for the interaction term. As with the single-agent BLRM, you can use the calculator to transform prior information on particular dose levels to a bivariate normal for either Agent 1 or Agent 2.In this tutorial, we will simply enter the following values adapted from Neuenschwander et al. (2015). In the Stopping Rules tab, you may specify various rules for stopping the trial. The logical operators (And/Or) follow left-to-right precedence, beginning with the top-most rule in the table. The order of the stopping rules is determined by the order of selection. Enter the following inputs as below. Select the Minimum Ss rule first, followed by the Target Rule, followed by the Allocation Rule. Be sure to select the appropriate logical operators. This combination of rules implies the MTD dose combination declared will meet the following conditions: (1) At least 6 patients have already been allocated to this combination, and (2) This dose satisfies one of the following: (i) The 692 32.5 BLRM Dual Combination – 32.5.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probability of targeted toxicity at this combination exceeds 0.5, or (ii) A minimum of 15 subjects have already been observed in the trial. In the Response Generation tab, enter the following inputs. Make sure that the starting dose combination is the lowest dose level for each agent. In the Simulation Control tab, select the following options. In this tutorial, we will run only 1000 simulations. Click Simulate. 32.5 BLRM Dual Combination – 32.5.1 Simulation 693 <<< Contents 32 * Index >>> Dose Escalation In the Output Preview toolbar, click the icon to Sim1 to the Library. Double-click Sim1 in the Library to display the simulation output details. With Sim1 selected in the Library, click the Plots icon to access a wide range of available plots. Below is an example of the MTD plot, showing the percentage of simulations that each dose combination was selected as the MTD. The combinations whose true DLT rates were below, within, and above the target toxicity interval (0.16 − 0.35) are colored blue, green, and red, respectively. 32.5.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned for 694 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 each agent, the Subjects Allocated and the DLTs Observed. Click OK to continue. The next recommended dose is 100 mg for Agent 1 and 20 mg for Agent 2. Recall that the dose skipping constraints are that the dose increment cannot exceed 100% of the current dose, and that only one compound can be increased. Of the eligible dose combinations, the recommended one has the highest probability of 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring 695 <<< Contents 32 * Index >>> Dose Escalation targeted toxicity. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. For example, change the left-hand plot to Dose Limiting Toxicity to view the number of subjects and DLTs 696 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 observed at each dose combination. Continue in this manner by clicking Enter Interim Data, entering the following doses, and the corresponding number of DLTs. The recommended MTD combination is 200 mg for Agent 1 and 30 mg for Agent 2. 32.5 BLRM Dual Combination – 32.5.2 Interim Monitoring 697 <<< Contents 32 32.6 * Index >>> Dose Escalation Product of Independent beta Probabilities dose Escalation (PIPE) 32.6.1 Simulation One of the core concepts underlying the PIPE method is the maximum tolerated contour (MTC), a line partitioning the dose combination space into toxicity probabilities either less than or greater than the target. The recommended dose combination at the end of the trial is the dose combination closest from below to the MTC. The following figures from Mander and Sweeting (2015) illustrate the MTC, and the related concepts of admissible dose combinations (adjacent or closest) and dose skipping options (neighborhood vs non-neighborhood constraint). This figure below shows six monotonic MTCs for two agents, each with two dose levels. After each cohort, the most likely contour is selected before applying a dose selection strategy. The next dose combination is chosen from a set of admissible doses, which are either closest to the most likely contour, or adjacent. In the figure below, all the (X) and (+) symbols are considered adjacent. Of these, the (X) symbols represent the closest doses. 698 32.6 PIPE – 32.6.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Of the admissible doses, the next dose combination chosen is that with the minimum sample size, where sample size is defined as the prior and trial sample size combined. The weighted randomization method selects one of the admissible doses at random, with selection probabilities weighted by the inverse of their sample size. For dose skipping options, one can choose between a neighborhood constraint, or a non-neighborhood constraint. The neighborhood constraint restricts the set of admissible doses to those a single dose level higher or lower than the current dose for both agents, while the non-neighborhood constraint restricts the set of admissible doses to a single dose level higher or lower than any previously administered dose combination. This figure below illustrates the neighborhood constraint, at two different cohorts. Only those combinations within the dashed box are admissible. The asterisk symbol on the left represents the admissible dose combination closest to the MTC. 32.6 PIPE – 32.6.1 Simulation 699 <<< Contents 32 * Index >>> Dose Escalation This figure below illustrates the non-neighborhood constraint. The set of admissible doses is now larger because all previously administered doses are included. Finally, there is a safety constraint threshold to avoid overdosing. Averaging over the posterior distribution of all monotonic contours, the expected probability of being above the MTC is calculated for all dose combinations. Those dose combinations whose expected probabilities exceed the safety threshold are excluded from the admissible set. 700 32.6 PIPE – 32.6.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Discrete: Dose Escalation on the Design tab, and then click Two Agents Design: Product of Independent Beta Probabilities Dose Escalation. In the Design Parameters tab, select the following options. In addition to the Closest and Adjacent options for Admissible Dose Combinations, there is also an Interval option. This allows you to specify a margin around the target toxicity level to define the admissible dose set, rather than relying on the MTC. Dose combinations whose posterior mean toxicity risk lies in the specified target interval (PT ± ) are considered admissible. For the prior specification, enter the following values. When entering the same prior 32.6 PIPE – 32.6.1 Simulation 701 <<< Contents 32 * Index >>> Dose Escalation sample size for each dose combination, a value of 1 considered a strong prior, whereas a value of 1 divided by the number of combinations can be considered a weak prior (Mander & Sweeting, 2015). In the Stopping Rules tab, there are a number of options similar to those from other designs. However, for this tutorial, leave these options unchecked. Similarly, leave the default options in the Response Generation tab. In this tutorial, the true probabilities of toxicity will be in agreement with the prior medians specified 702 32.6 PIPE – 32.6.1 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 above. In the Simulation Controls tab, you can run 1000 simulations, although the PIPE method runs relatively quickly. In the Output Preview toolbar, click the icon to save the simulation to the Library. Double-click the simulation node in the Library to display the simulation output details. In the MTD Analysis table, you can see that the (Agent 1, Agent 2) dose combinations selected most often as MTD were: (300, 10) at 22.1% and (300, 20) at 20.8%. The true 32.6 PIPE – 32.6.1 Simulation 703 <<< Contents 32 * Index >>> Dose Escalation probabilities of toxicity at these combinations were 0.24 and 0.28, respectively. 32.6.2 Interim Monitoring Right-click the Simulation node in the Library, and select Interim Monitoring. This will open an empty interim monitoring dashboard. Click Enter Interim Data to open a window in which to enter data for the first cohort: in particular, the Dose Assigned for 704 32.6 PIPE – 32.6.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 each agent, the Subjects Allocated and the DLTs Observed. Click OK to continue. The next recommended dose is 200 mg for Agent 1 and 20 mg for Agent 2. Recall that the dose skipping constraints allow for diagonal escalation (that is, escalation on both agents at the same time), but the neighborhood constraint restricts the set of admissible doses to a single dose level higher or lower than the current dose. Given these constraints, the dose combination (200, 10) is the only combination closest to the most probable MTC. The MTC plot allows you to view the most probable MTC, the current dose, the 32.6 PIPE – 32.6.2 Interim Monitoring 705 <<< Contents 32 * Index >>> Dose Escalation recommended dose, and all tried doses. You may replace the IM dashboard plots with any other plots or corresponding tables, by clicking on the associated icons at the top left of each panel. Continue in this manner by clicking Enter Interim Data, entering the following doses, and the corresponding number of DLTs. Click Stop Trial. The recommended MTD combination is 200 mg for Agent 1 and 10 706 32.6 PIPE – 32.6.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 mg for Agent 2. The recommended MTD combination must meet three criteria: (i) closest to MTC from below, (ii) have been experimented on, and (iii) below safety threshold. If there is no dose combination satisfying all three criteria, the MTD will be undetermined. 32.6 PIPE – 32.6.2 Interim Monitoring 707 <<< Contents * Index >>> Volume 4 Exact Binomial Designs 33 Introduction to Volume 8 709 34 Binomial Superiority One-Sample – Exact 714 35 Binomial Superiority Two-Sample – Exact 736 36 Binomial Non-Inferiority Two-Sample – Exact 37 Binomial Equivalence Two-Sample – Exact 38 Binomial Simon’s Two-Stage Design 774 751 767 <<< Contents * Index >>> 33 Introduction to Volume 8 This volume describes various cases of clinical trials using binomial endpoints where the asymptotic normal approximation to the test statistic may fail. This is often the case in situations where the trial sample size is too small, however testing and analysis based on the exact binomial distribution would provide valid results. Asymptotic tests may also fail when proportions are very close to the boundary, namely zero or one. These exact methods can be applied in situations where the normal approximation is adequate, in which case the solutions to both the exact and asymptotic method would converge to the same result. Using exact computations, Chapter 34 deals with the design and interim monitoring of a one sample test of superiority for proportion. The first section discusses a fixed and group sequential design in which an observed binomial response rate is compared to a fixed response rate. The following section illustrates how, for a fixed sample, McNemar’s conditional test can be used to compare matched pairs of binomial responses. Chapters 35 through 37 illustrates how to use East to design two-sample exact tests of superiority, non-inferiority and equivalence, including examples for both the difference and ratio of proportions. Chapter 38 describes Simon’s two stage design in an exact setting, which computes the expected minimal sample size of a trial that may be stopped due to futility or continue to a second stage to further study efficacy and safety. It is important to note that all exact tests work with only integer values for sample size, and will override the Design Defaults - Common: Do not round off sample size/events flag in the Options menu. Whenever the Perform Exact Computations check box is selected in the Design Input Output dialog box, resulting sample sizes will be converted to an integer value for all computations, including power and chart/table values. 709 <<< Contents 33 33.1 * Index >>> Introduction to Volume 8 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 710 33.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 33.1 Settings 711 <<< Contents 33 * Index >>> Introduction to Volume 8 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 712 33.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 33.1 Settings 713 <<< Contents * Index >>> 34 Binomial Superiority One-Sample – Exact This chapter deals with the design and interim monitoring of tests involving binomial response rates using exact computations. Section 34.1 discusses a fixed sample and group sequential design in which an observed binomial response rate is compared to a fixed response rate. In Section 34.2, McNemar’s conditional test for comparing matched pairs of binomial responses for a fixed sample is discussed. 34.1 Binomial OneSample 34.1.1 Trial Design 34.1.2 Interim Monitoring In experimental situations where the variable of interest has a binomial distribution, it may be of interest to determine whether the response rate π differs from a fixed value π0 . Specifically, we wish to test the null hypothesis H0 : π = π0 against one-sided alternatives of the form H1 : π > π0 or H1 : π < π0 . Either the sample size or power is determined for a specified value of π which is consistent with the alternative hypothesis, denoted as π1 . 34.1.1 Trial Design Consider a single-arm oncology trial designed to determine if the tumor response rate for a new cytotoxic agent is at least 15%. Thus it is desired to test the null hypothesis H0 : π = 0.15 against the one-sided alternative hypothesis H1 : π > 0.15. The trial will be designed using a one-sided test that achieves 80% power at π = π1 = 0.25 with a level α = 0.05 test. Single-Look Design To illustrate this example, in East under the Design ribbon for Discrete data, click One Sample and then choose Single Arm Design: Single Proportion: 714 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Prop. Response under Null (π0 ): 0.15 Prop. Response under Alt (π1 ): 0.25 34.1 Binomial One-Sample – 34.1.1 Trial Design 715 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 80% power is 110 subjects. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is 0.035, less than the specified value of 0.05. Similarly, the attained power is 0.81, slightly larger than the specified value of 0.80. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 716 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. The design details can be displayed by clicking the icon. The critical point, or the boundary set for the rejection of H0 is 24 (on the # response scale). Therefore out of 110 subjects, if the observed number of patients responding to the new treatment exceeds 24, the null hypothesis will be rejected in favor of declaring the new treatment to be superior. This can also be seen using both a response scale and proportion scale in either the Stopping Boundaries chart or table, available in the 34.1 Binomial One-Sample – 34.1.1 Trial Design 717 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact Library Three-Look Design In order to reach an early decision and enter into comparative trials, conduct this single-arm study as a group sequential trial with a maximum of 3 718 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 looks. Create a new design by selecting Des1 in the Library, and clicking the icon on the Library toolbar. To generate a study with two interim looks and a final analysis, change the Number of Looks from 1 to 3. A Boundary Info tab will appear, which allows the specification of parameters for the Efficacy and Futility boundary families. By default, an efficacy boundary to reject H0 is selected, however there is no futility boundary to reject H1 . The Boundary Family specified is of the Spending Functions type and the default Spending Function is the Lan-DeMets (Lan & DeMets, 1983), with Parameter OF (O’Brien-Fleming). The default Spacing of Looks is Equal, therefore the interim analyses will be equally spaced by the number of patients accrued between looks. Return to the the Design Parameters dialog box. The binomial parameters π0 = 0.15 34.1 Binomial One-Sample – 34.1.1 Trial Design 719 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact and π1 = 0.25 are already specified. Click Compute to generate this exact design: The maximum sample size is again 110 subjects with 110 also expected under the null hypothesis H0 : π = 0.15 and 91 expected when the true value is π=0.25. Save this design to the Library. 720 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The details for Des2 can be displayed by clicking the icon. Here we can see the cumulative sample size and cumulative type 1 error (α) spent at each of the three looks. The boundaries set for the rejection of H0 at each look are 14, 19 and 24 (on the # response scale). For example, at the second look with a cumulative 73 subjects, if the observed number of patients responding to the new treatment exceeds 19, the null hypothesis would be rejected in favor of declaring the new treatment to be superior. In addition, the incremental boundary crossing probabilities under both the null and alternative are displayed for each look. The cumulative boundary stopping probabilities can also be seen in the Stopping Boundaries chart and table. Select Des 2 in the Library, click the icon and 34.1 Binomial One-Sample – 34.1.1 Trial Design 721 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact choose Stopping Boundaries. The default scale is # Response Scale. The Proportion Scale can also be 722 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 chosen from the drop-down list Boundary Scale in the chart. To examine the Error Spending function click the 34.1 Binomial One-Sample – 34.1.1 Trial Design icon in the Library and 723 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact choose Error Spending. When the sample size for a study is subject to external constraints, power can be computed for a specified maximum sample size. Suppose for the previous design the total sample size is constrained to be at most 80 subjects. Create a new design by editing Des2 in the Library. Change the parameters so that the trial is now designed to compute power for a maximum sample size of 80 subjects, as shown below. 724 34.1 Binomial One-Sample – 34.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The trial now attains only 73.9% power. Power vs Sample size-Sawtooth paradigm Generate the Power vs. Sample Size graph for Des 2. You will get the following power chart which is commonly described in the literature as a sawtooth chart. This chart illustrates that it is possible to have designs with different sample sizes but all with the same power. What is not apparent is that for designs with the same power, the attained significance level may vary. Upon examination, the sample sizes of 43 and 55 seem to have the same power of about 0.525. The data can also be displayed in a 34.1 Binomial One-Sample – 34.1.1 Trial Design 725 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact chart form by selecting the icon in the Library, and can be printed from here as well. Compute the power for two new designs based on Des 2 with sample sizes of 43 and 55 respectively. Although sample sizes of 43 and 55 attain nearly same power, the attained significance levels are different, at 0.049 and 0.031 respectively. Though both are less than the design specification of 0.05, the plan with lower sample size of 43 pays a higher penalty in terms of type-1 error than the plan with a larger sample size of 55. 34.1.2 Interim Monitoring Consider the interim monitoring of Des 2, which has 80% power. Select this design 726 34.1 Binomial One-Sample – 34.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 from the Library and click the icon. Suppose at the first interim look, when 40 subjects have enrolled, the observed cumulative response is 12. Click the Enter Interim Data button at the top left of the Interim Monitoring window. Enter 40 for the Cumulative Sample Size and 12 34.1 Binomial One-Sample – 34.1.2 Interim Monitoring 727 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact for the Cumulative Response in the Test Statistic Calculator window. At the second interim monitoring time point when 80 subjects have enrolled, suppose the cumulative responses increases to 20. Again click the Enter Interim Data button at the top left of the Interim Monitoring window. Enter 80 for the Cumulative Sample Size and 20 for the Cumulative Response in the Test Statistic Calculator window. This will result in the following message: 728 34.1 Binomial One-Sample – 34.1.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 It can be concluded that π > 0.15 and the trial should be terminated. Clicking on Stop results in the final analysis. 34.2 McNemar’s Conditional Exact Test McNemar’s conditional test is used in experimental situations where paired comparisons are observed. In a typical application, two binary response measurements are made on each subject – perhaps from two different treatments, or from two different time points. For example, in a comparative clinical trial, subjects are matched on baseline demographics and disease characteristics and then randomized with one subject in the pair receiving the experimental treatment and the other subject receiving the control. Another example is the crossover clinical trial in which each subject receives both treatments. By random assignment, some subjects receive the experimental treatment followed by the control while others receive the control followed by the experimental treatment. Let πc and πt denote the response probabilities for the control and experimental treatments, respectively. The probability parameters for this test are displayed in Table 34.1. 34.2 McNemar’s Conditional Exact Test 729 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact Table 34.1: A 2 x 2 Table of Probabilities for McNemar’s Conditional Exact Test Control No Response Response Total Probability Experimental No Response Response π00 π01 π10 π11 1 − πt πt Total Probability 1 − πc πc 1 The null hypothesis H0 : πc = πt is tested against the alternative hypothesis H1 : πc > πt (or H1 : πc < πt ) for the one-sided testing problem. Since πt = πc if and only if π01 = π10 , the null hypothesis is also expressed as H0 : π01 = π10 , is tested against corresponding one-sided alternative. The power of this test depends on two quantities: 1. The difference between the two discordant probabilities (which is also the difference between the response rates of the two treatments) δ = π01 − π10 = πt − πc ; 2. The sum of the two discordant probabilities ξ = π10 + π01 . East accepts these two parameters as inputs at the design stage. 34.2.1 Trial Design Consider a trial in which we wish to determine whether a transdermal delivery system (TDS) can be improved with a new adhesive. Subjects are to wear the old TDS (control) and new TDS (experimental) in the same area of the body for one week each. A response is said to occur if the TDS remains on for the entire one-week observation 730 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 period. From historical data, it is known that control has a response rate of 85% (πc = 0.85). It is hoped that the new adhesive will increase this to 95% (πt = 0.95). Furthermore, of the 15% of the subjects who did not respond on the control, it is hoped that 87% will respond on the experimental system. That is, π01 = 0.87 × 0.15 = 0.13. Based on these data, we can fill in all the entries of Table 34.1 as displayed in Table 34.2. Table 34.2: McNemar Probabilities for the TDS Trial Control No Response Response Total Probability Experimental No Response Response 0.02 0.13 0.03 0.82 0.05 0.95 Total Probability 0.15 0.85 1 As it is expected that the new adhesive will increase the adherence rate, the comparison is posed as a one-sided testing problem, testing H0 : πc = πt against H1 : πc < πt at the 0.05 level. We wish to determine the sample size to have 90% power for the values displayed in Table 34.2. To illustrate this example, in East under the Design ribbon for Discrete data, click One Sample and then choose Paired Design: McNemar’s: 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design 731 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact This will launch the following input window: Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Difference in Probabilities (δ1 = πt − πc ): 0.1 Prop. of Discordant Pairs (ξ = π01 + π10 ): 0.16 Click Compute. The sample size for this design is calculated and the results are shown 732 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as a row in the Output Preview window: The sample size required in order to achieve 90% power is 139 subjects. As is standard in East, this design has the default name Des 1. To see a summary of icon in the the output of this design, click anywhere in the row and then click the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and clicking the icon. 34.2 McNemar’s Conditional Exact Test – 34.2.1 Trial Design 733 <<< Contents 34 * Index >>> Binomial Superiority One-Sample – Exact The design details can be displayed by clicking the icon. The critical point, or the boundary set for the rejection of H0 is 1.645 It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 734 34.2 McNemar’s Conditional Exact Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tools are available in the Library under the Plots and Tables menus. 34.2 McNemar’s Conditional Exact Test 735 <<< Contents * Index >>> 35 Binomial Superiority Two-Sample – Exact In many experiments based on binomial data, the aim is to compare independent samples from two populations in terms of the proportion of sampling units presenting a given trait. In medical research, outcomes such as the proportion of patients responding to a therapy, developing a certain side effect, or requiring specialized care, would satisfy this definition. East exact tests support the design and monitoring of clinical trials in which this comparison is based on either the difference of proportions or ratio of proportions of the two populations. These two cases are discussed in Sections 35.1, and 35.2 respectively. Caution: The methods presented in this chapter are computationally intensive and could consume several hours of computer time if the exact sample sizes are very large. Here are some guidelines: 1. Estimate the likely sample size under the Exact method by first determining the asymptotic sample size 2. If the exact sample size is likely to be larger than 1000, computing power is preferable to computing the sample size 35.1 Difference of Two Binomial Proportions Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let δ = πt − πc . Interest lies in testing the null hypothesis that δ = 0 against one and two-sided alternatives. 35.1.1 Trial Design The technical details of the sample size computations for this option are given in Appendix V. 35.1.1 Trial Design In a clinical study, an experimental drug coded Y73 is to be compared with a control drug coded X39 to treat chronic hepatitis B infected adult patients. The primary end point is histological improvement as determined by Knodell Scores at week 48 of treatment period. It is estimated that the proportion of patients who are likely to show histological improvement under treatment X39 to be 25% and under the treatment Y73, as much as 60%. A one-sided fixed sample study is to be designed with α = 0.05 and 90% power. Single Look Design To illustrate this example, in East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Difference of 736 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Proportions: This will launch the following input window: The goal of this study is to test the null hypothesis, H0 , that the X39 and Y73 arms both have an event rate of 25%, versus the alternative hypothesis, H1 , that Y73 increases the event rate by 35%, from 25% to 60%. This will be a one-sided test with a single fixed look at the data, a type-1 error of α = 0.05 and a power of (1 − β) = 0.9. Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Prop. under Control (πc ): 0.25 Prop. under Treatment (πt ): 0.6 Diff. in Prop. (δ1 = πt − πc ): (will be calculated) 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design 737 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 90% power is 68 subjects. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is 0.049, slightly less than the specified value of 0.05. Similarly, the attained power is 0.905, slightly larger than the specified value of 0.90. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 738 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. The design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design 739 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact tools are available in the Library under the Plots and Tables menus. In tabular form: 740 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design 741 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact The critical point, or the boundary set for the rejection of H0 is 1.715 attained at πU = 0.371 (on the Z scale) and 0.176 (on the δ scale). If the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be superior. This can also be seen in the Stopping Boundaries chart and table, available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect δ = πt − πc . Here it is easy to see how as treatment effect size increases (H1 : alternative treatment is superior) the power of the study 742 35.1 Difference of Two Binomial Proportions – 35.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 reaches the desired 90%. This is available in tabular form as well. 35.2 Ratio of Two Let πc and πt denote the binomial probabilities for the control and treatment arms, Binomial Proportions respectively, and let ρ = πt /πc . It is of interest to test the null hypothesis that ρ = 1 against a one-sided alternative. The technical details of the sample size computations for this option are given in Appendix V. 35.2.1 Trial Design In a clinical study, an experimental drug coded Y73 is to be compared with a control drug coded X39 to treat chronic hepatitis B infected adult patients. The primary end point is histological improvement as determined by Knodell Scores at week 48 of treatment period. It is estimated that the proportion of patients who are likely to show histological improvement under treatment coded X39 to be 25% and under the treatment coded Y73 as much as 60%, that is 2.4 times the rate for X39. A single look, one-sided fixed sample study is to be designed with α = 0.05 and 90% power. Single Look Design 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design 743 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact To illustrate this example, in East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Ratio of Proportions: This will launch the following input window: Leave the default values of Design Type: Superiority and Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Prop. under Control(πc ): 0.25 Prop. under Treatment (πt ): (will be calculated to be 0.6) Ratio of Proportions (ρ1 ): 2.4 744 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 90% power is 72 subjects. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is 0.046, less than the specified value of 0.05. Similarly, the attained power is 0.903, slightly larger than the specified value of 0.90. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design 745 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact clicking the icon. Design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 746 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tools are available in the Library under the Plots and Tables menus. 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design 747 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact In tabular form: The critical point, or the boundary set for the rejection of H0 is 1.813 (on the Z scale). If the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be superior. This boundary can be seen in terms of the observed ratio (0.916 on the ln(ρ) scale and 2.5 on the ρ scale) in the Stopping 748 35.2 Ratio of Two Binomial Proportions – 35.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Boundaries chart and table, available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect (ratio or ln(ratio) scale). Here it is easy to see how as the ratio (treatment effect size) increases (H1 : the new treatment is superior) the power 35.2 Ratio of Two Binomial Proportions 749 <<< Contents 35 * Index >>> Binomial Superiority Two-Sample – Exact of the study reaches the desired 0.9%. This is available in tabular form as well. 750 35.2 Ratio of Two Binomial Proportions <<< Contents * Index >>> 36 Binomial Non-Inferiority Two-Sample – Exact In a non-inferiority trial, the goal is to establish that the response rate of an experimental treatment is no worse than that of an established control. A therapy that is demonstrated to be non-inferior to the current standard therapy might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. Non-inferiority trials are designed by specifying a non-inferiority margin, which is the acceptable amount by which the response rate on the experimental arm can be less than the response rate on the control arm. If the experimental response rate falls within this margin, the new treatment can claim to be non-inferior. This chapter presents the design of non-inferiority trials in which this margin is expressed as either the difference between or the ratio of two binomial proportions. The difference is examined in Section 36.1 and is followed by two formulations for the ratio in Section 36.2. Caution: The methods presented in this chapter are computationally intensive and could consume several hours of computer time if the exact sample sizes are very large. Here are some guidelines: 1. Estimate the likely sample size under the Exact method by first determining the asymptotic sample size 2. If the exact sample size is likely to be larger than 1000, computing power is preferable to computing the sample size 36.1 Difference of Proportions Let πc and πt denote the response rates for the control and experimental treatments, respectively. Let δ = πt − πc . The null hypothesis is specified as 36.1.1 Trial Design H0 : δ = δ 0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient harm rather than benefit, then δ0 > 0 and the alternative hypothesis is H1 : δ < δ 0 or equivalently as H1 : πc > πt − δ0 . Conversely, if the occurrence of a response denotes patient benefit rather than harm, then δ0 < 0 and the alternative hypothesis is H1 : δ > δ 0 or equivalently as H1 : πc < πt − δ0 . 36.1 Difference of Proportions 751 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact For any given πc , the sample size is determined by the desired power at a specified value of δ = δ1 . A common choice is δ1 = 0 (or equivalently πt = πc ) but East allows the study to be powered at any value of δ1 which is consistent with the choice of H1 . Let π̂t and π̂c denote the estimates of πt and πc based on nt and nc observations from the experimental and control treatments, respectively. The test statistic is Z= δ̂ − δ0 se(δ̂) (36.1) where δ̂ = π̂t − π̂c and s se(δ̂) = π̃t (1 − π̃t ) π̃c (1 − π̃c ) + . nt nc (36.2) (36.3) Here π̃t and π̃c are the restricted maximum likelihood estimates of πt and πc . For more details refer to Appendix V. 36.1.1 Trial Design To evaluate the efficacy and safety of drug A vs. drug B in antiretroviral naive HIV-infected individuals, a phase3, 52 week double-blind randomized study is conducted. The primary response measure is the proportion of patients with HIV-RNA levels ¡ 50 copies/mL. The study is a non-inferiority designed trial where a standard drug A is expected to have a response rate of 80% and a new experimental drug B is to be compared under a non-inferiority margin of 20% (δ0 = 0.20). For these studies, inferiority is assumed as the null hypothesis and is to be tested against the alternative of non-inferiority using a one-sided test. Therefore under the null hypothesis H0 : πc = 0.8 and πt = 0.60. We will test this hypothesis against H1 , that both response rates are equal to the null response rate of the control arm, i.e. δ1 = 0. Thus, under H1 , we have πc = πt = 0.8. East will be used to conduct a one-sided α = 0.025 level test with 90% power. Single Look Design To illustrate this example, in East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Difference of 752 36.1 Difference of Proportions – 36.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Proportions: This will launch the following input window: Change Design Type: Noninferiority and keep Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.025 Power: 0.9 Sample Size (n): Computed (select radio button) Specify Proportion Response Prop. under Control (πc ): 0.8 Specify Null Hypothesis Prop. under Treatment (πt0 ): 0.6 Noninferiority margin (δ0 ): -0.2 (will be calculated) Specify Alternative Hypothesis Prop. under Treatment (πt1 ): 0.8 36.1 Difference of Proportions – 36.1.1 Trial Design 753 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact Diff. in Prop. (δ1 = πt1 − πc ): 0 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: This single look design requires a combined total of 172 patients in order to achieve 90% power. As is standard in East, this design has the default name Des 1. To see a summary of icon in the the output of this design, click anywhere in the row and then click the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 754 36.1 Difference of Proportions – 36.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 clicking the icon. The design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. In this example, sample sizes ranging from approximately 168-175 result in power close to the required 0.9. These visual tools are available in the Library under the Plots and Tables menus. 36.1 Difference of Proportions – 36.1.1 Trial Design 755 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact The critical point, or the efficacy boundary set for the rejection of H0 is 1.991 (on the 756 36.1 Difference of Proportions – 36.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Z scale) and (-0.056 on the δ scale). If the magnitude of the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be non-inferior. This can also be seen in the Stopping Boundaries chart and table, available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect δ = πt − πc . Here it is easy to see how as treatment effect size approaches zero (H1 : no difference between the two treatments) the power 36.1 Difference of Proportions – 36.1.1 Trial Design 757 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact of the study reaches the desired 90%. This is available in tabular form as well. 36.2 Ratio of Proportions Let πc and πt denote the response rates for the control and the experimental treatments, respectively. Let the difference between the two arms be captured by the ratio πt ρ= . πc The null hypothesis is specified as H0 : ρ = ρ0 and is tested against one-sided alternative hypotheses. If the occurrence of a response denotes patient benefit rather than harm, then ρ0 < 1 and the alternative hypothesis is H1 : ρ > ρ0 or equivalently as H1 : πt > ρ0 πc . Conversely, if the occurrence of a response denotes patient harm rather than benefit, then ρ0 > 1 and the alternative hypothesis is H1 : ρ < ρ0 758 36.2 Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 or equivalently as H1 : πt < ρ0 πc . For any given πc , the sample size is determined by the desired power at a specified value of ρ = ρ1 . A common choice is ρ1 = 1 (or equivalently πt = πc ), but East permits you to power the study at any value of ρ1 which is consistent with the choice of H1 . 36.2.1 Trial Design Suppose with a rare disease condition, the cure rate with an expensive treatment A is estimated to be 90%. The claim of non-inferiority for an inexpensive new treatment B can be held if it can be statistically proven that the ratio ρ = πt /πc is at least 0.833. In other words, B is considered to be non-inferior to A as long as πt > 0.75. Thus the null hypothesis H0 : ρ = 0.833 is tested against the one-sided alternative hypothesis H1 : ρ > 0.833. We want to determine the sample size required to have power of 80% when ρ = 1 using a one-sided test with a type-1 error rate of 0.05. Single Look Design Powered at ρ = 1 Consider a one look study with equal sample sizes in the two groups. In East under the Design ribbon for Discrete data, click Two Samples and then choose Parallel Design: Ratio of Proportions: 36.2 Ratio of Proportions – 36.2.1 Trial Design 759 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact This will launch the following input window: Change Design Type: Noninferiority and keep Number of Looks: 1. In the Design Parameters dialog box, select the Perform Exact Computations checkbox and keep the Test Statistic selected to Wald. Enter the following parameters: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Specify Proportion Prop. under Control (πc ): 0.9 Specify Null Hypothesis Prop. under Treatment (πt0 ): 0.75 Noninferiority margin (ρ0 ): 0.833 (will be calculated) Specify Alternative Hypothesis Prop. under Treatment (πt1 ): 0.9 Ratio of Proportions (ρ1 = πt1 /πc ): 1 760 36.2 Ratio of Proportions – 36.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: The sample size required in order to achieve 80% power is 120 subjects. Note that because of the discreteness involved in performing exact computations, the attained power is 0.823, slightly larger than the specified value of 0.80. As is standard in East, this design has the default name Des 1. To see a summary of icon in the the output of this design, click anywhere in the row and then click the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 36.2 Ratio of Proportions – 36.2.1 Trial Design 761 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact clicking the icon. Design details can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not 762 36.2 Ratio of Proportions – 36.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual tools are available in the Library under the Plots and Tables menus. 36.2 Ratio of Proportions – 36.2.1 Trial Design 763 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact The critical point, or the boundary set for the rejection of H0 is 1.961 (on the Z scale), 0.076 (on the ln(ρ) scale)and 1.079 (on the ρ scale). If the observed test statistic exceeds this boundary the null will be rejected in favor of declaring the new treatment to be non-inferior. This can also be seen in the Stopping Boundaries chart and table, 764 36.2 Ratio of Proportions – 36.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 available in the Library. The Power vs. Treatment Effect chart dynamically generates power under this design for all values of treatment effect (ratio or ln(ratio) scale). Here it is easy to see how as treatment effect size approaches zero (H1 : no difference between the two 36.2 Ratio of Proportions – 36.2.1 Trial Design 765 <<< Contents 36 * Index >>> Binomial Non-Inferiority Two-Sample – Exact treatments) the power of the study reaches the desired 0.8%. This is available in tabular form as well. 766 36.2 Ratio of Proportions <<< Contents * Index >>> 37 37.1 Equivalence Test Binomial Equivalence Two-Sample – Exact In some experimental situations, it is desired to show that the response rates for the control and the experimental treatments are ”close”, where ”close” is defined prior to the collection of any data. It may be of interest to show that the rate of an adverse event associated with an aggressive therapy is similar to that of the established control. For example, the bleeding rate associated with thrombolytic therapy or cardiac outcomes with a new stent. Let πc and πt denote the response rates for the control and the experimental treatments, respectively and let δ = πt − πc . (37.1) The null hypothesis H0 : |πt − πc | = δ0 is tested against the two-sided alternative H1 : |πt − πc | < δ0 , where δ0 (> 0) defines equivalence. The theory is presented in Section V.4 of Appendix V. Caution: The methods presented in this chapter are computationally intensive and could consume several hours of computer time if the exact sample sizes are very large. Here are some guidelines: 1. Estimate the likely sample size under the Exact method by first determining the asymptotic sample size 2. If the exact sample size is likely to be larger than 1000, computing power is preferable to computing the sample size 37.1.1 Trial Design Burgess et al. (2005) describe a randomized controlled equivalence trial, in which the objective is to evaluate the efficacy and safety of a 4% dimeticone lotion for treatment of head lice infestation, relative to a standard treatment. The success rate of the standard treatment is estimated to be about 77.5%. Equivalence is defined as δ0 = 0.20. The sample size is to be determined with α = 0.025 (two-sided) and power, i.e. probability of declaring equivalence, of 1 − β = 0.90. To illustrate this example, in East under the Design ribbon for Discrete data, click 37.1 Equivalence Test – 37.1.1 Trial Design 767 <<< Contents 37 * Index >>> Binomial Equivalence Two-Sample – Exact Two Samples and then choose Parallel Design: Difference of Proportions: This will launch the following input window: Change Design Type: Equivalence and in the Design Parameters dialog box, select the Perform Exact Computations checkbox. Enter the following parameters: Test Type: 2 sided (required) Type 1 Error (α): 0.025 Power: 0.9 Sample Size (n): Computed (select radio button) Specify Proportion Response Prop. under Control (πc ): 0.775 Prop. under Treatment (πt0 ): 0.775 (will be calculated) Expected Diff. (δ1 = πt − πc ): 0 Equivalence Margin (δ0 ): 0.2 768 37.1 Equivalence Test – 37.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. The sample size for this design is calculated and the results are shown as a row in the Output Preview window: This single look design requires a combined total of 228 patients in order to achieve 90% power. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed in the upper pane, labeled Output Summary. This can be saved to the Library by selecting Des 1 and 37.1 Equivalence Test – 37.1.1 Trial Design 769 <<< Contents 37 * Index >>> Binomial Equivalence Two-Sample – Exact clicking the icon. The design details, which include critical points, or the boundaries set for the rejection of H0 , can be displayed by clicking the icon. It is important to note that in this exact computation the displayed sample size may not be unique due to the discreteness of the distribution. This can be seen in the Power Vs Sample Size graph, which is a useful tool along with its corresponding table, and can be used to find all other sample sizes that guarantee the desired power. These visual 770 37.1 Equivalence Test – 37.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tools are available in the Library under the Plots and Tables menus. In tabular form: 37.1 Equivalence Test – 37.1.1 Trial Design 771 <<< Contents 37 * Index >>> Binomial Equivalence Two-Sample – Exact Suppose the expected value of the difference in treatment proportions δ1 is 0.05 or 0.10. A recalculation of the design shows the required sample size will increase to 300 772 37.1 Equivalence Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and 606 respectively: 37.1 Equivalence Test 773 <<< Contents * Index >>> 38 Binomial Simon’s Two-Stage Design The purpose of a phase II trial is to determine if a new drug has sufficient efficacy against a specific disease or condition to either warrant further development within Phase II, or to advance onto a Phase III study. In a two-staged design, a fixed number of patients are recruited and treated initially, and if the protocol is considered effective the second stage will continue to enroll additional patients for further study regarding efficacy and safety. This chapter presents an example for the widely used two-stage optimal and minimax designs developed by Simon (1989). In addition, East supports the framework of an admissible two-stage design, a graphical method geared to search for an alternative with more favorable features (Jung, et al. 2004). The underlying theory is examined in Appendix U. 38.1 An Example During a Phase II study of an experimental drug, a company determined that a response rate of 10% or less is to be considered poor, whereas a response rate is 40% or more is to be considered promising or good. Requirements call for a two-stage study with the following hypotheses: H0 : π ≤ 0.10 H1 : π ≥ 0.40 and design parameters α = 0.05 and 1 − β = 0.90. 38.1.1 Trial Design To illustrate this example, in East under the Design ribbon for Discrete data, click One Sample and then choose Single Arm Design: Simon’s Two Stage Design: 774 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Choose Design Type: Optimal and enter the following parameters in the Design Parameters dialog box: Test Type: 1 sided (required) Type 1 Error (α): 0.05 Power: 0.9 Upper Limit for Sample Size: 100 Prop. Response under Null (π0 ): 0.1 Prop. Response under Alternative (π1 ): 0.4 38.1 An Example – 38.1.1 Trial Design 775 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design Click Compute. The design is calculated and the results are shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon. The design details will be displayed in the upper pane, labeled Output Summary. Note that because of the discreteness involved in performing exact computations, the attained type-1 error is less than the specified value of 0.05. Similarly, the attained power is slightly larger than the specified value. Save this design to the Library by selecting Des 1 and clicking the icon. Under the optimal design, the combined maximum sample size for both stages is computed to be 20. The boundary parameter for futility at the first look is 1, and at the second look it is 4. What this means can be further explained using the Stopping 776 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Boundaries chart available under the Plots menu. The scale of the stopping boundaries can be displayed using either number of responses (# Resp. Scale) or Proportion Scale. The above graph uses the number of responses, which tells us that at the first look, when the cumulative sample size is 9, the trial could be stopped for futility if no more than one patient shows a favorable response to treatment. At the second stage, when all 20 patients are enrolled, the boundary response to reject H1 is 4 or less. The Stopping Boundaries table under the Tables menu also tells us that the probability of crossing the stopping 38.1 An Example – 38.1.1 Trial Design 777 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design boundary, thus warranting early termination, is 0.775. Results can be further analyzed using the Expected Sample size (under Null) vs. 778 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sample Size graph, which is also available in tabular form: To generate a more sophisticated analysis of the design, select the icon in the Library. In addition to details pertaining to the required optimal design, East also generates results for both minimax as well as admissible designs in regards to sample size, power and probability, and weights used. 38.1 An Example – 38.1.1 Trial Design 779 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design For the optimal design the expected sample size under the null, which assumes the drug performs poorly, is 11.447, which can also be seen in the Admissible Designs table, available under the Tables menu: To regenerate the study using a minimax design, select the Edit Design icon. Select Design Type: Minimax, leave all design parameters the same and click Compute. The cumulative maximum sample size for both stages using this design is 18. As with the optimal design, the first stage boundary response to reject H1 is 1 or 780 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 less and the second stage boundary response to reject H1 is 4 or less. Save this design to the Library by selecting Des 2 and clicking the icon. Design details, graphs and tables can be attained as with the optimal design described above. East provides the capability to visually compare stopping boundaries for both methods simultaneously using a compare plots graph. From the Library select both designs, click the icon, and select Stopping Boundaries. 38.1 An Example – 38.1.1 Trial Design 781 <<< Contents 38 * Index >>> Binomial Simon’s Two-Stage Design These stopping boundaries can be compared in tabular format as well: Although the two futility boundaries are the same for both designs, the cumulative sample size at both stages differ. We also see that the probability of early stopping for futility is higher under the optimal design (0.775) than with the minimax design (0.659). However the cumulative sample size at stage one for the optimal design is only 9 whereas the minimax design requires 12 subjects for the first stage. Referring to the design details generated for the optimal design above, we see that an admissible design (labeled Design # 2) requires a total sample size of 19. Here, the cumulative number of subjects required at the end of stage one is only 6 and offers a probability of early stopping of 0.531, less than both the optimal and minimax designs. It is also worthy to note that for the admissible design, the boundary parameter for futility at the first look is 0. This means that only one patient has to show a promising result for the study to proceed to a second stage, whereas at least two successes are required for both 782 38.1 An Example – 38.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the optimal and minimax designs to warrant a second stage. 38.1 An Example – 38.1.1 Trial Design 783 <<< Contents * Index >>> Volume 5 Poisson and Negative Binomial Endpoints 39 Introduction to Volume 4 785 40 Count Data One-Sample 790 41 Count Data Two-Samples 799 <<< Contents * Index >>> 39 Introduction to Volume 4 This volume describes various cases of clinical trials involving count data. This is often useful in medical research due to its nature of modeling events counted in terms of whole numbers, particularly events that may be considered rare. Typically, interest lies in the rate of occurrence of a particular event during a specific time interval or other unit of space. Chapter 40 describes the design of tests involving count or Poisson response rates in which an observed response rate is compared to a fixed response rate, possibly derived from historical data. Chapter 41 deals with the comparison of independent samples from two populations in terms of the rate of occurrence of a particular outcome. East supports the design of clinical trials in which this comparison is based on the ratio of rates, assuming a Poisson or Negative Binomial distribution. 785 <<< Contents 39 39.1 * Index >>> Introduction to Volume 4 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 786 39.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 39.1 Settings 787 <<< Contents 39 * Index >>> Introduction to Volume 4 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 788 39.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 39.1 Settings 789 <<< Contents * Index >>> 40 Count Data One-Sample This chapter deals with the design of tests involving count or Poisson response rates. Here, independent outcomes or events under examination can be counted in terms of whole numbers, and typically are considered rare. In other words, a basic assumption of the Poisson distribution is that the probability of an event occurring is proportional to the length of time under consideration. The longer the time interval, the more likely the event will occur. Therefore, in this context interest lies in the rate of occurrence of a particular event during a specified period. Section 40.1 focuses on designs in which an observed Poisson response rate is compared to a fixed response rate, possibly derived from historical data. 40.1 Single Poisson Rate Data following a Poisson distribution are non-negative integers, and the probability that an outcome occurs exactly k times can be calculated as: P (k) = e−λ λk , k = 0, 1, 2, . . . where λ is the average rate of occurrence. k! When comparing a new protocol or treatment to a well-established control, a preliminary single-sample study may result in valuable information prior to a full-scale investigation. In experimental situations it may be of interest to determine whether the response rate λ differs from a fixed value λ0 . Specifically we wish to test the null hypothesis H0 : λ = λ0 against the two sided alternative hypothesis H1 : λ 6= λ0 or against one sided alternatives of the form H1 : λ > λ0 or H1 : λ < λ0 . The sample size, or power, is determined for a specified value of λ which is consistent with the alternative hypothesis, denoted λ1 . 40.1.1 Trial Design Consider the design of a single-arm clinical trial in which we wish to determine if the positive response rate of a new acute pain therapy is at least 30% per single treatment cycle. Thus, it is desired to test the null hypothesis H0 : λ = 0.2 against the one-sided alternative hypothesis H1 : λ ≥ 0.3. The trial will be designed such that a one sided α = 0.05 test achieves 80% power at λ = λ1 = 0.3. In the Design tab under the Count group choose One Sample and then Single Poisson 790 40.1 Single Poisson Rate – 40.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Rate. This will launch the following input window: Enter the following design parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Rate under Null (λ0 ): 0.2 Rate under Alt. (λ1 ): 0.3 Follow-up Time (D): 1 40.1 Single Poisson Rate – 40.1.1 Trial Design 791 <<< Contents 40 * Index >>> Count Data One-Sample Click Compute. The design is shown as a row in the Output Preview window: The sample size required in order to achieve the desired 80% power is 155 subjects. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details are displayed labeled Output Summary. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. An alternative method to view design details is to hover the cursor over the node Des1 in the Library. A tooltip will appear that summarizes the 792 40.1 Single Poisson Rate – 40.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 input parameters of the design. Click icon on the Library toolbar, and then click Power vs. Sample Size. The power curve for this design will be displayed. You can save this chart to the Library by clicking Save inWorkbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or JPEG) by clicking Save As... or Export into a 40.1 Single Poisson Rate – 40.1.1 Trial Design 793 <<< Contents 40 * Index >>> Count Data One-Sample PowerPoint presentation. Close the Power vs. Sample Size chart. To view a summary of all characteristics of this design, select Des1 in the Library, and click icon. In addition to the Power vs. Sample size chart and table, East also provides the efficacy boundary in the Stopping Boundaries chart and table. 794 40.1 Single Poisson Rate – 40.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Alternatively, East allows the computation of either the Type-1 error (α) or Power for a given sample size. Using the Design Input/Output window as described above, simply enter the desired sample size and click Compute to calculate the resulting power of the test. Power vs Sample Size: Sawtooth paradigm Consider the following design which uses East to compute power assuming a one sample, single Poisson rate. Test Type: 1 sided Type 1 Error (α): 0.025 Power: Computed Sample Size (n): 525 Rate under Null (λ0 ): 0.049 Rate under Alt. (λ1 ): 0.012 Follow-up Time (D): 0.5 Save the design to a workbook, and then generate the Power vs. Sample Size graph to obtain the power chart. The resulting curve is commonly described in the literature as a 40.1 Single Poisson Rate – 40.1.1 Trial Design 795 <<< Contents 40 * Index >>> Count Data One-Sample sawtooth chart. This chart illustrates that it is possible to have a design where different sample sizes could obtain the same power. As with the binomial distribution, the Poisson distribution is discrete. For power and sample size computations for discrete data, the so called ”Saw tooth” phenomena occurs. The data can also be displayed in a chart form by selecting the 796 40.1 Single Poisson Rate – 40.1.1 Trial Design icon in the <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library, and can be printed or saved as case data. It is important to note that for designs with the same power, the attained significance level may vary. For example, the sample sizes of 565 and 580 seem to have a similar power of about 0.94. Upon computing two new designs based on the above design with sample sizes of 565 and 580 respectively, it is apparent that the attained significance levels are different. The design with a lower sample size of 565 pays a higher penalty in terms of type-1 error (α = 0.03) than the plan with a larger sample 40.1 Single Poisson Rate 797 <<< Contents 40 * Index >>> Count Data One-Sample size of 580 (α = 0.016). 798 40.1 Single Poisson Rate <<< Contents * Index >>> 41 Count Data Two-Samples Often in experiments based on count data, the aim is to compare independent samples from two populations in terms of the rate of occurrence of a particular outcome. In medical research, outcomes such as the number of times a patient responds to a therapy, develops a certain side effect, or requires specialized care, are of interest. Or perhaps a therapy is being evaluated to determine the number of times it must be applied until an acceptable response rate is observed. East supports the design of clinical trials in which this comparison is based on the ratio of rates, assuming a Poisson or Negative Binomial distribution. These two cases are presented in Sections 41.1 and 41.2, respectively. 41.1 Poisson - Ratio of Rates 41.1.1 Trial Design 41.1.2 Example - Coronary Heart Disease Let λc and λt denote the Poisson rates for the control and treatment arms, respectively, and let ρ1 = λt /λc . We want to test the null hypothesis that ρ1 = 1 against one or two-sided alternatives. The sample size, or power, is determined to be consistent with the alternative hypothesis, that is H1 : λt 6= λc , H1 : λt > λc , or H1 : λt < λc . 41.1.1 Trial Design Suppose investigators are preparing design objectives for a prospective randomized trial of a standard treatment (control arm) vs. a new combination of medications (therapy arm) to present at a clinical trials workshop. The endpoint of interest is the number of abnormal ECGs (electrocardiogram) within seven days. The investigators were interested in comparing the therapy arm to the control arm with a two sided test conducted at the 0.025 level of significance. It can be assumed that the rate of abnormal ECGs in the control arm is 30%, thus λt = λc = 0.3 under H0 . The investigators wish to determine the sample size to attain power of 80% if there is a 25% decline in the event rate, that is λt /λc = 0.75. It is important to note that the power of the test depends on λc and λt , not just the ratio, so different values of the pair (λc , λt ) with the same ratio will yield different solutions. We will now design a study that compares the control arm to the combination therapy arm. In the Design tab under the Count group choose Two Samples and then Parallel 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design 799 <<< Contents 41 * Index >>> Count Data Two-Samples Design - Ratio of Poisson Rates. This will launch the following input window: Enter the following design parameters: Test Type: 2-sided Type 1 Error (α): 0.05 Power: 0.8 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Rate for Control (λc ): 0.3 Rate for Treatment (λt ): 0.225 (will be automatically calculated) Ratio of Rates ρ1 = (λt /λc ): 0.75 Follow-up Control (Dc ): 7 Follow-up Treatment (Dt ): 7 800 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. Here we assume the same number of patients in both arms. Click Compute. The design is shown as a row in the Output Preview window: The sample size required in order to achieve the desired 80% power is 211 subjects. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design icon in the 801 <<< Contents 41 * Index >>> Count Data Two-Samples Output Preview toolbar. The design details are displayed, labeled Output Summary. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. An alternative method to view design details is to hover the cursor over the node Des1 in the Library. A tooltip will appear that summarizes the input parameters of the design. With the design Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The power curve for this design will be displayed. You can save this chart to the Library by clicking Save inWorkbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or 802 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 JPEG) by clicking Save As... or Export into a PowerPoint presentation. Close the Power vs. Sample Size chart. To view all computed characteristics of this 41.1 Poisson - Ratio of Rates – 41.1.1 Trial Design 803 <<< Contents 41 * Index >>> Count Data Two-Samples design, select Des1 in the Library, and click icon. In addition to the Power vs. Sample size chart and table, East also provides the efficacy boundary in the Stopping Boundaries chart and table. Alternatively, East allows the computation of either the Type-1 error (α) or Power for a given sample size. Using the Design Input Output window as described above, simply enter the desired sample size and click Compute to calculate the resulting power of the test. 41.1.2 Example - Coronary Heart Disease The following example is presented in the paper by Gu, et al. (2008) which references a prospective study reported by Stampfer and Willett (1985) examining the relationship between post-menopausal hormone use and coronary heart disease (CHD). Researchers were interested if the group using hormone replacement therapy exhibited less coronary heart disease. The study did show strong evidence that the incidence rate of CHD in the group who did not use hormonal therapy was higher than that in the group who did use post-menopausal hormones. The authors then determined the sample size necessary for the two groups when what they referred to as the ratio of sampling frames is 2, known as the allocation ratio in East. The study assumed an observation time of 2 years, and that the incidence rate of CHD for those using the 804 41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hormone therapy is 0.0005. The following excerpt from the paper presents the required sample sizes for the participants using hormone therapy in order to achieve 90% power at α = 0.05, for multiple different test procedures: It is first necessary to determine the difference in notation between the referenced paper and that used by East: Gu et al. (2008) γ1 γ0 0 R =4 D East λt λc 1/ρ1 = 0.25 Allocation Ratio = 2 Once again in the Design tab under the Count group choose Two Samples and then Parallel Design - Ratio of Poisson Rates. Enter the following design parameters: Test Type: 1-sided Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 2 41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease 805 <<< Contents 41 * Index >>> Count Data Two-Samples Rate for Control (λc ): 0.002 Rate for Treatment (λt ): 0.0005 Ratio of Rates ρ1 = (λt /λc ): 0.25 Follow-up Control (Dc ): 2 Follow-up Treatment (Dt ): 2 The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 2:1 indicates that two-thirds of the patients are randomized to the treatment arm as opposed to one-third to the control. Compute the design to produce the following output: 806 41.1 Poisson - Ratio of Rates – 41.1.2 Example - Coronary Heart Disease <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 6 in the referenced paper shows the number of subjects required for the treatment group. The East results show that the total number of subjects required for the entire study is 10027. Given that the allocation ratio is 2, the number of subjects required for the control group is 10027/3=3342 and the treatment group is therefore 6685. This falls in the range of the sample sizes presented in the referenced paper (and close to the minimum of 6655), which again calculates these sizes using a number of different methods. 41.2 Negative Binomial Ratio of Rates In experiments where the data follows a binomial distribution, the number of successful outcomes for a fixed number of trials is of importance when determining the sample size to adequately power a study. Suppose instead that it is of interest to observe a fixed number of successful outcomes (or failures), but the overall number of trials necessary to achieve this is unknown. In this case, the data is said to follow a Negative Binomial Distribution. There are two underlying parameters of interest. As with the Poisson distribution, λ denotes the average rate of response for a given outcome. In addition, a shape parameter γ specifies the desired number of observed ”successes”. As with the Poisson distribution, the Negative Binomial distribution can be useful when designing a trial where one must wait for a particular event. Here, we are waiting for a specific number of successful outcomes to occur. A Poisson regression analysis assumes a common rate of events for all subjects within a stratum, as well as equal mean and variance (equidispersion). With over dispersed count data, estimates of standard error from these models can be invalid, leading to difficulties in planning a clinical trial. Increased variability resulting from over dispersed data requires a larger sample size in order to maintain power. To address this issue of allowing variability between patients, East provides valid sample size and power calculations for count data using a negative binomial model, resulting in a better evaluation of study design and increased likelihood of trial success. 41.2.1 Trial Design Suppose that a hypothetical manufacturer of robotic prostheses, those that require several components to fully function, has an order to produce a large quantity of artificial limbs. According to historical data, about 20% of the current limbs fail the rigorous quality control test and therefore cannot be shipped to patients. For each order, the manufacturer must produce more than requested; in fact they must continue to produce the limbs until the desired quantity passes quality control. Given that there is a high cost in producing these prosthetic limbs, it is of great interest reduce the number of those that fail the test. 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 807 <<< Contents 41 * Index >>> Count Data Two-Samples The company plans to introduce a new feature to the current model, the goal being the probability of failure is reduced to 10%. It is safe to assume that the enhancement will not cause a decline in the original success rate. In this scenario, we wish to test the null hypothesis H0 : λc = λt = 0.2 against the one sided alternative of the form H1 : λc > λt . Quality control investigators wish to conduct a one-sided test at the α = 0.05 significance level to determine the sample size required obtain 90% power to observe a 50% decline in the event rate, i.e. λt /λc = 0.5. It is important to note that the power of the test depends on λc and λt , not just the ratio, so different values of the pair (λc , λt ) with the same ratio will have different solutions. The same holds true for the shape parameter. Different values of (γc , γt ) will result in different sample sizes or power calculations. East allows user specific shape parameters for both the treatment and control groups, however for this example assume that the desired number of successful outcomes for both groups is 10. The following illustrates the design of a two-arm study comparing the control arm, which the current model of the prosthesis, to the treatment arm, which is the enhanced model. In the Design tab under the Count group choose Two Samples and then Parallel Design - Ratio of Negative Binomial Rates. 808 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Enter the following design parameters: Test Type: 1 sided Type 1 Error (α): 0.05 Power: 0.9 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Rate for Control (λc ): 0.2 Rate for Treatment (λt ): 0.1 Ratio of Rates ρ = (λt /λc ): 0.5 Follow-up Time (D): 1 Shape Control (γc ): 10 Shape Treatment (γt ): 10 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 809 <<< Contents 41 * Index >>> Count Data Two-Samples The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. Here we assume the same number of patients in both arms. Click Compute. The design is shown as a row in the Output Preview window: The sample size required in order to achieve the desired 90% power is 1248 subjects. As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 810 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary. In the Output Preview toolbar, click icon to save this design Des1 to workbook Wbk1 in the Library. An alternative method to view design details is to hover the cursor over the node Des1 in the Library. A tooltip will appear that summarizes the input parameters of the design. With the design Des1 selected in the Library, click icon on the Library toolbar, and then click Power vs. Sample Size. The power curve for this design will be displayed. You can save this chart to the Library by clicking Save inWorkbook. Alternatively, you can export the chart in one of several image formats (e.g., Bitmap or 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 811 <<< Contents 41 * Index >>> Count Data Two-Samples JPEG) by clicking Save As... or Export into a PowerPoint presentation. Close the Power vs. Sample Size chart. To view all computed characteristics of this 812 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design, select Des1 in the Library, and click icon. In addition to the Power vs. Sample size chart and table, East also provides the efficacy boundary in the Stopping Boundaries chart and table. For a specific desired sample size, East allows the computation of either the Type-1 error (α) or Power for a test. Using the Design Input Output window and methods as described above, simply enter the desired sample size and click Compute to calculate the resulting power of the test. In addition to this example, consider the following illustration of the benefit of using the negative binomial model in clinical trials. In real life settings, the variance of count data observed between patients is typically higher than the observed mean. The negative binomial model accommodates between subject heterogeneity according to a Gamma distribution. For example: Poisson: Y ∼ P oisson(λ) Negative Binomial: Yi ∼ P oisson(λki ) where ki ∼ Gamma(k) In the case of no overdispersion (k = 0) the negative binomial model reduces to the 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 813 <<< Contents 41 * Index >>> Count Data Two-Samples Poisson model. In the figure below, the Poisson and negative binomial models are displayed under various values of the dispersion parameter. Assuming the above parameterization, the variance of the negative binomial model is λ + kλ2 . The inflation in variance is thus linear by the factor 1 + k ∗ λ and dependent on the mean. Depending on the distributional assumption used and its impact on the variance, sample size and power can vary widely. In multiple sclerosis (MS) patients, magnetic resonance imaging (MRI) is used as a marker of efficacy by means of serial counts of lesions appearing on the brain. Exacerbations rates as a primary endpoint are frequently used in MS as well as in chronic obstructive pulmonary disease (COPD) and asthma (Keene et al. 2007). Poisson regression could be considered, however this model would not address variability between patients, resulting in over dispersion. The negative binomial model offers an alternative approach. TRISTAN (Keene et al. 2007) was a double-blind, randomized study for COPD comparing the effects of the salmeterol/fluticasone propionate combination product 814 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (SFC) to salmeterol alone, fluticasone proprionate alone and placebo. Although the primary end-point was pre-bronchodilator FEV1, the number of exacerbations was an important secondary endpoint. Suppose we are to design a new trial to be observed over a period of 1 to 2 years. The primary objective is the reduction of the rate of exacerbations, defined as a worsening of COPD symptoms that require treatment with antibiotics, cortisone or both, with the combination product SFC versus placebo. Based on the TRISTAN results, we aim to reduce the incidence of events by 33%. Suppose the exacerbation rate is 1.5 per year, and can expect to detect a rate of 1.0 in the combination group. Assume a 2-sided test with a 5% significance level and 90% power. Using a Poisson model, a total of 214 patients are needed to be enrolled in the study. For the TRISTAN data, the estimate of the overdispersion parameter was 0.46 (95% CI: 0.34-0.60). Using a negative binomial model with overdispersion of 0.33, 0.66, 1 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 815 <<< Contents 41 * Index >>> Count Data Two-Samples and 2, the increase in sample size ranged from 298 to 725, respectively. Exacerbation rates are calculated as number of exacerbations divided by the length of time in treatment in years. EAST can be used to illustrate the impact of a one versus two year study by changing the follow-up duration. For 382 patients and a shape parameter of 0.66, power is increased from 90% to 97% 816 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 when follow-up time is doubled : The number of patients required for a two year study powered at 90% is 277, whereas 382 patients would be required to achieve the same power for a study period of one 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design 817 <<< Contents 41 * Index >>> Count Data Two-Samples year. Negative binomial models are increasing in popularity for medical research, and as the industry standard for trial design, East continues to evolve by incorporating sample size methods for count data. These models allow the count to vary around the mean for groups of patients instead of the population means. Additionally, increased variability does lead to a larger test population; consequently the balance between power, sample size and duration of observation needs to be evaluated. 818 41.2 Negative Binomial Ratio of Rates – 41.2.1 Trial Design <<< Contents * Index >>> Volume 6 Time to Event Endpoints 42 Introduction to Volume 6 43 Tutorial: Survival Endpoint 820 826 44 Superiority Trials with Variable Follow-Up 45 Superiority Trials with Fixed Follow-Up 865 908 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates 934 47 Non-Inferiority Trials with Fixed Follow-Up 950 48 Superiority Trials Given Accrual Duration and Study Duration 966 49 Non Inferiority Trials Given Accrual Duration and Study Duration 984 50 A Note on Specifying Dropout parameters in Survival Studies 994 51 Multiple Comparison Procedures for Survival Data 999 <<< Contents * Index >>> 42 Introduction to Volume 6 The chapters in this volume deal with clinical trials where the endpoint of interest is the time from entry into the study until a specific event –for example, death, tumour recurrence, or heart attack – occurs. Such trials are also referred to as survival trials, time-to-event trials, or time-to-failure trials. Long-term mortality trials in oncology, cardiology or HIV usually select time-to-event as the primary endpoint. The group sequential methodology is particularly appropriate for such trials because of the potential to shorten the study duration and thereby bring beneficial new therapies to patients sooner than would be possible by a conventional single-look design. In contrast to studies involving normal and binomial endpoints, the statistical power of a time-to-event study is determined, not by the number of individuals accrued, but rather by the number ofs events observed. Accruing only as many individuals as the number of events required to satisfy power considerations implies having to wait until all individuals have reached the event. This will probably make the trial extend over an unacceptably long period of time. Therefore one usually accrues a larger number of patients than the number of events required, so that the study may be completed within a reasonable amount of time. East allows the user a high degree of flexibility in this respect. This volume contains Chapters 42 through 47. Chapter 42 is the present chapter. It describes the contents of the remaining chapters of Volume 6. Chapter 43 introduces you to East on the Architect platform, using an example clinical trial to compare survival in two groups. In Chapter 44 we discuss the Randomized Aldactone Evaluation Study (RALES) for decreasing mortality in patients with severe heart failure (Pitt et al., 1999). The chapter illustrates how East may be used to design and monitor a group sequential two-sample superiority trial with a time-to-event endpoint. We begin with the simple case of a constant enrollment rate, exponential survival and no drop-outs. The example is then extended to cover non-uniform enrollment, non-constant hazard rates for survival, and differential drop-out rates between the treatment and control arms. The role of simulation in providing additional insights is discussed. Simulations in presence of non-proportional hazard rates, stratification variables are explained. The trial was designed so that every subject who had not dropped out or reached the stated endpoint would be followed until the trial was terminated. This is an example of a variable follow-up design, because subjects who are enrolled at the beginning of the enrollment phase are followed for a longer time than subjects who are enrolled later. In contrast to Chapter 44, Chapter 45 deals with the fixed follow-up design. Here we 820 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design a trial in which each subject can only be followed for a maximum of one year and then goes off study. We use East to design such a trial basing the design parameters on the PASSION and TYPHOON trials – two recently published studies of drug eluting stents (Spaulding et al., 2006; Laarman et al., 2006). The impact of variable accrual patterns and drop-outs is also taken into account. Chapter 46 shows how to use East to design a non-inferiority trial with a time-to-event endpoint. The setting is a clinical trial to demonstrate the non-inferiority of Xeloda to 5-FU+LV in patients with metastatic colorectal cancer (Rothman et al., 2003). Part of the discussion in this chapter is about the choice of the non-inferiority margin. Chapter 47 will illustrate through a worked example how to design, monitor and simulate a two-sample non-inferiority trial with a time-to-event endpoint in which each subject who has not dropped out or experienced the event is followed for a fixed duration only. This implies that each subject who does not drop-out or experience the event within a given time interval, as measured from the time of randomization, will be administratively censored at the end of that interval. In East we refer to such designs as fixed follow-up designs. Chapters 48 and 49 handle the trade-off between patient accruals and study duration in a different way from the previous chapters. In contrast to publicly funded trials, which usually lack the resources to exert control over the accrual rate of a trial, industry trials are often run with a fixed timeframe as the constraint. Industry sponsors would rather adjust the patient recruitment rate by opening and closing investigator sites than delay the end of a study and therefore their entire drug development program, time to market, and revenue. Chapters 48 and 49 illustrate how to design superiority and non-inferiority trials in East given a fixed accrual period and fixed study duration. Additionally, these design options provide the users with many useful graphs that chart the relationship between power, sample size, number of events, accrual duration, and study duration. Also note that Chapter 44 contains a section that guides the user through the powerful survival simulation tool available in East. Chapter 50 is a note which gives details on specifying dropout parameters for survival studies in East with the help of an example. A unified formula for calculating the expected number of events d(l) in a time-to-event trial can be found in the Appendix D. 821 <<< Contents 42 42.1 * Index >>> Introduction to Volume 6 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 822 42.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 42.1 Settings 823 <<< Contents 42 * Index >>> Introduction to Volume 6 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 824 42.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 42.1 Settings 825 <<< Contents * Index >>> 43 Tutorial: Survival Endpoint This tutorial introduces you to East 6, using examples for designing a clinical trial to compare survival in two groups. It is suggested that you go through the tutorial while you are at the computer, with East 6 running in it. 43.1 A Quick Feel of the Software When you open East 6, the screen will look as shown below. In the tabs bar at the top of the ribbon, Design tab is already selected. Each tab has its own ribbon. All the commands buttons under Design tab are displayed in its ribbon, with suggestive icons. These commands have been grouped under the categories of Continuous, Discrete, Count, Survival and General. For this tutorial, let us explore the command Two Samples under Survival category. In East, we use the terms ’time to event’ and ’survival’ interchangeably. Click on Two Samples. You will see a list of 826 43.1 A Quick Feel of the Software <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 action items, which are dialog box launchers. Click on Logrank Test Given Accrual Duration and Study Duration. You will get the following dialog box in the work area. This dialog box is for computing Sample Size (n) and Number of Events. All the default input specifications under the tab Design Parameters are on display: Design Type=Superiority, Number of Looks=1, Test Type=1-Sided, Type-1 Error (α)=0.025, Power (1-β)=0.9, Allocation Ratio (nt /nc )=1, # of Hazard Pieces=1, Input Method=Hazard Rates, Hazard Ratio (λt /λc )=0.5, Log Hazard Ratio ln(λt /λc )=-0.693, Hazard Rate (Control)=0.0347, Hazard Rate (Treatment)=0.0173, and Variance of Log-Hazard Ratio=Null. There are two radio buttons in this dialog box, one at the side of Power (1-β) box and the second at the side of the combined boxes for Sample Size (n) and Number of Events. By default, the latter radio button is 43.1 A Quick Feel of the Software 827 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint selected indicating that the items against this radio button are to be computed using all other inputs. Similarly, if the first radio button is selected, then Power will be computed using all other inputs. Now click on the tab Accrual/Dropout and you will see the following dialog box. The default specifications in this dialog box are: Subjects are followed=Until End of Study, Accrual Duration=22, Study Duration=38, # of Accrual Periods=1, and no Dropouts. Now accept all the default specifications that are displayed for this single look design and be ready to compute the Sample Size (n) and the Number of Events for the design. Click Compute. At the end of the computation, you will see the results appearing at the bottom of the screen, in the Output Preview pane, as shown below. This single row of output preview contains relevant details of all the inputs and the computed results for events and accruals. The maximum value for events is 88 and the committed accrual is 182 subjects. Since this is a fixed-look design, the expected events are same as the maximum required. Click anywhere in this row, and then click on the 828 icon to get a detailed display in the upper pane of the screen as shown 43.1 A Quick Feel of the Software <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. The contents of this output, displayed in the upper pane, are the same as what is contained in the output preview row for Design1 shown in the lower pane, but the upper pane display is easier to read and comprehend. The title of the upper pane display is Output Summary. This is because, you can choose more than one design in the Output Preview pane and the display in the upper pane will show the details of all the selected designs in juxtaposed columns. 43.1 A Quick Feel of the Software 829 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint The discussion so far gives you a quick feel of the software for computing the required events and sample size for a single look survival design. We have not discussed about all the icons in the output preview pane or the library pane or the hidden Help pane in the screen. We will describe them taking an example for a group sequential design in the next section. 43.2 Group Sequential Design for a Survival Superiority Trial 43.2.1 Background Information on the study 43.2.2 Creating the design in East 43.2.3 Design Outputs 43.2.4 East icons explained 43.2.5 Saving created designs 43.2.6 Displaying Detailed Output 43.2.7 Comparing Multiple Designs 43.2.8 Events vs. Time plot 43.2.9 Simulation 43.2.10 Interim Monitoring 43.2.1 Background Information on the study The randomized aldactone evaluation study (RALES) was a double-blind multicenter clinical trial of aldeosterone-recepter blocker vs. placebo published in New England Journal of Medicine (vol 341, 10, pages 709-717, 1999). This trial was open to patients with severe heart failure due to systolic left ventricular dysfunction. The Primary endpoint was all-causes mortality. The anticipated accrual rate was 960 patients/year. The mortality rate for the placebo group was 38%. The investigators wanted 90% power to detect a 17% reduction in the mortality hazard rate for the Aldactone group (from 0.38 to 0.3154) with α = 0.05, 2-sided test. Six DMC meetings were planned. The dropout rate in both the groups is expected to be 5% each year. The patient accrual period is planned to be 1.7 years and the total study duration to be 6 years. 43.2.2 Creating the design in East For our purpose, let us create our own design from the basic details of this study. Now start afresh East. On the Design tab, click on Two Samples under Survival category. You will see a list of action items, which are dialog box launchers. Click on the second option Logrank Test Given Accrual Duration and Study 830 43.2 Group Seq. Design – 43.2.2 Creating the design in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Duration. You will get the following dialog box in the work area. All the specifications you see in this dialog box are default values, which you will have to modify for the study under consideration. Now, let the Design Type be Superiority. Next, enter 6 in the Number of Looks box. You can see the range of choices for the 43.2 Group Seq. Design – 43.2.2 Creating the design in East 831 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint number of looks is from 1 to 20. Immediately after this selection, you will see a new tab Boundary Info added to the input dialog box. We will look into this tab, after you complete the filling of current tab Design Parameters. Next, choose 2-Sided in the Test Type box. Next, enter 0.05 in the Type-1 Error (α) box, and 0.9 in the Power box. Next enter the specifications for survival parameters. Keep # of Hazard Pieces as 1. Click on the check box against Hazard Ratio and choose Hazard Rates as the Input Method. Enter 0.83 as the Hazard Ratio and 0.38 as the Hazard Rate (Control). East computes and displays the Hazard Rate (Treatment) as 0.3154. Keep the default choice of Null for Variance of Log-Hazard Ratio. Now the dialog box will look as shown 832 43.2 Group Seq. Design – 43.2.2 Creating the design in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. Next click the tab Accrual/Dropout . Keep the specification ‘Until End of Study’ for Subjects are followed. Enter 1.7 as Accrual Duration and 6 as Study Duration. Keep # of Accrual Periods as 1. Change the # of Pieces for dropouts to 1. Choose ’Prob. of Dropout’ as the Input Method for entering information on dropouts. Enter 0.05 as probability of dropout at end of 1 year for both the groups. Now the dialog box will appear as shown below. Now click on the Boundary tab. In the dialog box of this tab, you can specify stopping boundaries for efficacy or futility or both. For this trial, let us consider only Efficacy boundaries only. Choose ’Spending Functions’ as the Efficacy Boundary Family. 43.2 Group Seq. Design – 43.2.2 Creating the design in East 833 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Choose ’Lan-DeMets’ in the Spending Function box. Choose ’OF’ in the Parameter box. Next, click the radio button near ’Equal’ for Spacing of Looks. Choose ’Z Scale’ in the Efficacy Boundary Scale box. In the table below of look-wise details, the columns - Info Fraction, Cumulative Alpha Spent, and the upper and lower efficacy boundaries are computed and displayed as shown here. Scroll a little bit to see the sixth look details. The two icons and represent buttons for Error Spending Function chart and Stopping Boundaries chart respectively. Click these two buttons one by one to see 834 43.2 Group Seq. Design – 43.2.2 Creating the design in East <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the following charts. 43.2 Group Seq. Design – 43.2.2 Creating the design in East 835 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint 43.2.3 Design Outputs Now you have completed specifying all the inputs required for a group sequential trial design and you are ready to compute the required events and sample size or accruals for the trial. Click on the Compute button. After the computation is over, East will show in the Output Preview pane the following results: This single row of output preview contains relevant details of all the inputs and the computed results for events and accruals. The maximum required Events is computed as 1243 and the Committed Accrual to be 1646 subjects. The expected Events under H0 and H1 are estimated to be 1234 and 904 respectively. The expected Study Duration under H0 and H1 are 5.359 and 3.729 respectively. Click anywhere in this Output Preview row and then click on 836 43.2 Group Seq. Design – 43.2.3 Design Outputs icon to get a <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 summary in the upper pane of the screen as shown below. 43.2.4 East icons explained In the ’Output Preview’ pane, you see the following icons in the upper row. The functions of the above icons are as indicated below. The tooltips also will indicate their functions. 43.2 Group Seq. Design – 43.2.4 East icons explained 837 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Output Summary(The output summary of selected design(s) will appear in the upper pane) Edit Design (The input dialog box of a selected design will appear in the upper pane) Save in Workbook (Save one or more selected designs in a workbook) Delete (Delete one or more selected designs) Rename (Rename Design names) Print (Print selected designs) Display Precision (Local Settings) Filter (Filter and select designs according to specified conditions) Show/Hide Columns (Show/Hide Columns of the designs in the Output Preview panel) The following icons can be seen at the right end of Output Preview pane and Output Summary or Input/Output window respectively. Their functions are: Maximize Output Preview Pane Minimize Output Preview Pane You may also notice a row of icons at the top of Output Summary window as shown below. The first icon is for Plot (Plots of a selected design will appear in a pop-up window). 838 43.2 Group Seq. Design – 43.2.4 East icons explained <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The second icon is for Show Tables (The data for different plots can be displayed in tabular form in pop-up windows). If you have multiple designs in the output summary window, the third icon becomes active and can be used to move the order of those columns in the Output Summary. The fourth icon is to print the Output Summary window. As an example, if you click Power vs. Sample Size under Plot icon, you will get the 43.2 Group Seq. Design – 43.2.4 East icons explained 839 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint following chart. If you want to see the data underlying the above chart, click Show Table icon and click 840 43.2 Group Seq. Design – 43.2.4 East icons explained <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Power vs. Sample Size. You will see the following table in a pop-up window. You can customize the format of the above table and also save it as case data in a workbook. You may experiment with all the above icon / buttons to understand their functions. 43.2.5 Saving created Designs in the library and hard disk In the Output Preview pane, select one or more design rows and click the icon, The selected design(s) will then get added as a node(s) in the current workbook, as 43.2 Group Seq. Design – 43.2.5 Saving created designs 841 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint shown below. The above action only adds the design to the workbook node in the library and it is not saved in the hard disk. For saving in the hard disk, you may either save the entire workbook or only the design by right-clicking on the desired item and choosing save or save as options. Here in the library also, you see rows of icons. Some of these icons you have already seen. The functions of other icons are: Details (Details of a selected design will appear on the upper pane in the work area) Output Settings (Output Settings can be changed here) Simulate (Start the simulation process for any selected design node) Interim Monitoring (Start the Interim Monitoring process for any selected design) 43.2.6 Displaying Detailed Output Select the design from the Library and click the 842 icon or Right-click on the Des1 43.2 Group Seq. Design – 43.2.6 Displaying Detailed Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 node in the library and click Details. You will see the detailed output of the design displayed in the Work area. 43.2 Group Seq. Design – 43.2.6 Displaying Detailed Output 843 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint 43.2.7 Comparing Multiple Designs Click on Des1 row and then click Edit icon . You will get the input dialog box in the upper pane. Change the Power value to 0.8 and then click Compute. You will see now Des2 is created and a row added to Output Preview pane as shown below. Click on Des1 row and then keeping Ctrl key pressed, click on Des2 row. Now both the rows will be selected. Next, click the Output Summary icon 844 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs . <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now you will see the output details of these two designs displayed in the upper pane Compare Designs in juxtaposed columns, as shown below. In a similar way, East allows the user to easily create multiple designs by specifying a range of values for certain parameters in the design window. For example, in a survival trial the Logrank Test given Accrual Duration and Study Duration design allows the input of multiple key parameters at once to simultaneously create a number of different designs. For example, suppose in a multi-look study the user wants to generate designs for all combinations of the following parameter values: Power = 0.8 and 0.9, and Hazard Ratio - Alternative = 0.6, 0.7, 0.8 and 0.9. The number of combinations is 2 x 4 = 8. East creates all permutations using only a single specification under the Design Parameters tab in the design window. As shown below, the values for Power are entered as a list of comma separated values, while the alternative hazard ratios are entered as a colon separated range of values, 0.6 to 0.9 in 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs 845 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint steps of 0.1. East computes all 8 designs and displays them in the Output Preview window: East provides the capability to analyze multiple designs in ways that make comparisons between the designs visually simple and efficient. To illustrate this, a selection of a few of the above designs can be viewed simultaneously in both the Output Summary section as well as in the various tables and plots. The following is a subsection of the designs computed from the above example with differing values for number of looks, power and hazard ratio. Designs are displayed side by side, allowing 846 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details to be easily compared: In addition East allows multiple designs to be viewed simultaneously either graphically or in tabular format: Notice that all the four designs in the Output Summary window are selected. Following figures compare these four designs in different formats. 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs 847 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Stopping Boundaries (table) Expected Sample Size (table) 848 43.2 Group Seq. Design – 43.2.7 Comparing Multiple Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Power vs. Sample Size (plot) Total Sample Size / Events vs. Time (plot) This capability allows the user to explore a greater space of possibilities when determining the best choice of study design. 43.2.8 Events vs. Time plot For survival studies, East provides a variety of charts and plots to visually validate and 43.2 Group Seq. Design – 43.2.8 Events vs. Time plot 849 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint analyze the design. For example, the Sample Size / Events vs. Time plot allows the user to see the rate of increase in the number of events (control and treatment) over time (accrual duration, study duration). An additional feature of this particular chart is that a user can easily update key input parameters to determine how multiple different scenarios can directly impact a study. This provides significant benefits during the design phase, as the user can quickly examine how a variety of input values affect a study before the potentially lengthy task of simulation is employed. To illustrate this feature what follows is the example from the RALES study. For study details, refer to subsection Background Information on the study of this tutorial. Currently there are ten designs in the Output Preview area. Select Des1 from them and save it to the current workbook. You may delete the remaining ones at this point. To view the Sample Size / Events vs. Time plot, select the corresponding node in the Library and under the Charts icon choose Sample Size / Events vs. Time: 850 43.2 Group Seq. Design – 43.2.8 Events vs. Time plot <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Survival parameters for this design can be edited directly through this chart by clicking the Modify button. The Modify Survival Design window is then displayed for the user to update design parameters: To illustrate the benefit of the modification feature, suppose at design time there is potential flexibility in the accrual and duration times for the study. To see how this may affect the number of subsequent events, modify the design to change the Accrual Duration to 3 and Study Duration to 4. Re-create the plot to view the effect of these new values on the shape and magnitude of the curves by clicking OK: Similar steps can be taken to observe the effect of changing other parameter values on the number of events necessary to adequately power a study. 43.2 Group Seq. Design – 43.2.8 Events vs. Time plot 851 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint 43.2.9 Simulation In the library, right-click on the node Des1 and click Simulate. You will be presented with the following Simulation sheet. This sheet has four tabs - Test Parameters, Response Generation, Accrual/Dropout, and Simulation Controls. Additionally, you can click Include Options and add some more tabs like Site, Randomization, User Defined R Function and Stratification. The first three tabs essentially contain the details of the parameters of the design. In the Simulation Control tab, you can specify the number of simulations to carry out and specify the file for storing simulation data. Let us first carry out 1000 simulations to check whether the design can reach the specified power of 90%. The Response Generation tab, by default, shows the hazard rates for control and treatment. We will use these values in our simulation. In the Simulation Control tab, specify the number of simulations as 1000. Use the 852 43.2 Group Seq. Design – 43.2.9 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Random number seed as Fixed 12345. Let us keep the values in other tabs as they are and click Simulate. The progress of simulation process will appear in a temporary window as shown below. This is the intermediate window showing the complete picture of simulations. Close this window after viewing it. You can see the complete simulation output in the details view. A new row, with the ID as Sim1, will be added in Output Preview. 43.2 Group Seq. Design – 43.2.9 Simulation 853 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Click on Sim1 row and click the Output Summary icon . You will see Simulation Output summary appearing in the upper pane. It shows that the simulated power as 0.892, indicating that in 892 out of 1000 simulations the boundary was crossed. You can save Sim1 as a node in the workbook. If you right-click on this node and then click Details, you will see the complete details of simulation appearing in the work 854 43.2 Group Seq. Design – 43.2.9 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 area. Here is a part of it. 43.2.10 Interim Monitoring Click Des1 node under workbook wbk1 and click the icon. Alternatively, you can right-click the Des1 node and select the item Interim Monitoring. In either case, you will see the IM dashboard appearing as shown below. 43.2 Group Seq. Design – 43.2.10 Interim Monitoring 855 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint In the top row, you see a few icons. For now, we will discuss only the first icon which represents Test Statistic Calculator. Using this calculator, you will enter the details of interim look data analysis results into the IM dashboard. Suppose we have the following data used by the Data Monitoring Committee during the first 5 looks of interim monitoring. Date Aug 96 Mar 97 Aug 97 Mar 98 Aug 98 Total Deaths 125 299 423 545 670 δ̂ -0.283 -0.195 -0.248 -0.259 -0.290 SE(δ̂) 0.179 0.116 0.097 0.086 0.077 Z-Statistic -1.581 -1.681 -2.557 -3.012 -3.766 The first look was taken at 125 events and the analysis of the data showed the value of δ= -0.283 and SE(δ)=0.179. First, click the blank row in the IM Dashboard and then click the icon. Now you can enter the first analysis results into the TS calculator and click Recalc. The Test Statistic value will be computed and the TS 856 43.2 Group Seq. Design – 43.2.10 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 calculator will appear as shown below. Now click on the button ’OK’ to get the first look details into IM Dashboard. The following message will appear that some required computations are being carried out. After the computations are over, the output for the first look will appear in the IM 43.2 Group Seq. Design – 43.2.10 Interim Monitoring 857 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Dashboard as shown below. For the first look at total number of events, 125, the Information Fraction works out to be 0.101. The efficacy boundaries for this information fraction are newly computed. The Repeated 95% Confidence Interval limits and Repeated p-value are computed and displayed. You may also see that the charts at the bottom of the IM Dashboard have been updated with relevant details appearing on the side. In a similar way, enter the interim analysis results for the next 4 looks in the IM Dashboard. At the fifth look, the boundary is crossed. A message window appears as shown below. 858 43.2 Group Seq. Design – 43.2.10 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Stop and you will see the details of all the looks in the IM Dashboard as shown below. The final Adjusted Inference output also appears as displayed below. One important point to note here is that this study got over almost about 2 years ahead of planned schedule, because of the very favorable interim analysis results. This completes the Interim Monitoring exercise in this trial. 43.3 User Defined R Function East allows you to customize simulations by inserting user-defined R functions for one or more of the following tasks: generate response, compute test statistic, randomize subjects, generate arrival times, and generate dropout information. The R functionality for arrivals and dropouts will be available only if you have entered such information at the design stage. Although the R functions are also available for all normal and binomial endpoints, we will illustrate this functionality for a time-to-event endpoint. Specifically, we will use an R function to generate Weibull survival responses. Start East afresh. On the Design tab, click Survival: Two Samples and then Logrank 43.3 User Defined R Function 859 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Test Given Accrual Duration and Study Duration. Choose the design parameters as shown below. In particular, select a one sided test with type-1 error of α = 0.025. Click Compute and save this design (Des1) to the Library. Right-click Des1 in the Library and click Simulate. In the Simulation Control Info tab, check the box for Suppress All Intermediate Output. Type 10000 for Number of 860 43.3 User Defined R Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Simulations and select Clock for Random Number Seed. In the top right-hand corner for the input window, click Include Options, and then click User Defined R Function. Go to the User Defined R Function tab. For now, leave the box Initialize R simulation (optional) unchecked. This optional task can be used to load required libraries, set seeds for simulations, and initialize global variables. Select the row for Generate Response, click Browse..., and navigate to the folder containing your R file. Select the file and click Open. The path should now be displayed under File Name. 43.3 User Defined R Function 861 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint Click View to open a notepad application to view your R file. In this example, we are generating survival responses for both control and treatment arms from a Weibull with shape parameter = 2 (i.e. exponential), with the same hazard rate in both arms. This sample file is available in the folder named R Samples under installation directory of East 6. Copy the function name (in this case GenWeibull) and paste it into the cell for Function Name. Save and close the R file, and click Simulate. Return to the tab for User Defined R Function, select the Generate Response row, and click View. In the R function, change the shape parameter = 1, to generate responses from a Weibull distribution with increasing hazards. Save and close the R 862 43.3 User Defined R Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 file, and click Simulate. You may have to save this file on some other location. Select both simulations (Sim1 and Sim2) from the Output Preview, and on the toolbar, click to display in the Output Summary. Notice that the type-1 error appears to be controlled in both cases. When we simulated from the exponential (Sim2), the average study duration (30.7 months) was close to what was calculated at Des1 for the expected study duration under the null. However, when we simulated from the Weibull with decreasing hazards (Sim1), the average study duration increased to 34.6 months. 43.3 User Defined R Function 863 <<< Contents 43 * Index >>> Tutorial: Survival Endpoint The ability to use custom R functions for many simulation tasks allows considerable flexibility in performing sensitivity analyses and assessment of key operating characteristics. 864 43.3 User Defined R Function <<< Contents * Index >>> 44 Superiority Trials with Variable Follow-Up This chapter will illustrate through a worked example how to design, monitor and simulate a two-sample superiority trial with a time-to-event trial endpoint. Each subject who has not dropped out or experienced the event is followed until the trial ends. This implies that a subject who is enrolled earlier could potentially be followed for a longer time than a subject who is enrolled later on in the trial. In East we refer to such designs as variable follow-up designs. 44.1 The RALES Clinical Trial: Initial Design The RALES trial (Pitt et al., 1999) was a double blind study of aldosterone-receptor blocker spironolactone at a daily dose of 25 mg in combination with standard doses of an ACE inhibitor (treatment arm) versus standard therapy of an ACE inhibitor (control arm) in patients who had severe heart failure as a result of systolic left ventricular dysfunction. The primary endpoint was death from any cause. Six equally-spaced looks at the data using the Lan-DeMets-O’Brien-Fleming spending function were planned. The trial was designed to detect a hazard ratio of 0.83 with 90% power at a two-sided 0.05 level of significance. The hazard rate of the control arm was estimated to be 0.38/year. The trial was expected to enroll 960 patients/year. We begin by using East to design RALES under these basic assumptions. Open East, click Design tab and then Two Samples button in Survival group. You will see the following screen. Note that there are two choices available in the above list; Logrank Test Given Accrual Duration and Accrual Rates and Logrank Test Given Accrual Duration and Study Duration. The option Logrank Test Given Accrual Duration and Study Duration is explained later in Chapter 48. Now click Logrank Test Given Accrual 44.1 The RALES Clinical Trial: Initial Design 865 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up Duration and Accrual Rates and you will get the following input dialog box. In the above dialog box, enter 6 for Number of Looks, keep the default choices of Design Type: Superiority, change the Test Type to 2-Sided, Type I Error (α) to 0.05, Power : 0.9, and the Allocation Ratio: 1. Further, keep the default choices of # of Hazard Pieces as 1 and the Input Method: as Hazard Rates. Click the check box against Hazard Ratio and enter the Hazard Ratio as 0.83. Enter Hazard Rate (Control) as 0.38. You will see the Hazard Rate (Treatment:Alt) computed as 0.3154. Also, keep the Variance of Log Hazard Ratio to be used as under Null. Now the Test Parameters tab of the input 866 44.1 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 dialog will appear as shown below. Now click on the tab Boundary. You will see the following input dialog box. Keep all the default specifications for the boundaries to be used in the design. You can 44.1 The RALES Clinical Trial: Initial Design 867 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up look at the Error Spending Chart by clicking on the icon Close this chart. If you click on the boundary chart icon 868 , you will see the boundary chart as 44.1 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 displayed below. Close this chart. Now click Accrual/Dropouts tab. Keep the default choice Until End of Study for the input Subjects are followed:. Keep the # of Accrual Periods as 1 and enter 960/year as the accrual rate. For this example, assume no dropouts. The dialog box 44.1 The RALES Clinical Trial: Initial Design 869 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up will look as shown below. Under the Accrual section and in column titled Comtd. (commited) , you see two radio buttons Durations and Subjects with the latter selected by default. The selected item will appear as the x-axis item in the Study Duration vs. Accrual chart, which you can get by clicking on the icon displayed on the side. Against Durations and Subjects you see two rows of three cells each. The first and third cells will show the min and max values for the row item and the middle cell, mid value between min and max values. From these results, you see that any sample size in the range 1243 to 3111 will suffice to attain the desired 90% power and selects 2177, the mid-point of the allowable range, as the default sample size. Depending on the needs of the study, you may wish to use a different sample size within the allowable range. The choice of sample size generally depends on how long you wish the study to last. The larger you make the patient accrual the shorter will be the total study duration, consisting of accrual time plus follow up time. To understand the essence of this trade-off, bring up 870 44.1 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Study Duration vs. Accrual chart by clicking on the icon . Based on this chart, a sample size of 1660 subjects is selected. Close the chart and enter 1660 for Committed Accrual (subjects). Click on Compute and see icon to the results in the new design created under Output Preview. Click the see the design summary. This sample size ensures that the maximum study duration will be slightly more than 4.9 years. Additionally, under the alternative hypothesis, the 44.1 The RALES Clinical Trial: Initial Design 871 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up expected study duration will be only about 3.3 years. 44.2 Incorporating Drop-Outs The investigators expect 5% of the patients in both the groups to drop out each year. To incorporate this drop-out rate into the design, in the Piecewise Constant Dropout Rates tab, select 1 for the number of pieces and change the Input Method from Hazard Rates to Prob. of Dropout. Then enter 0.05 as the probability of dropouts at 1 year for both the groups. To make Des1 and Des2 comparable, change the sample size of Des2 to 1660 by 872 44.2 Incorporating Drop-Outs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 typing this value into the Committed Accrual (Subjects) cell. Click on Compute and see the results in the new design created under Output Preview. Select the two designs and click on icon to see them side-by-side. A comparison of two designs reveals that, because of the drop-outs, the maximum study duration will be prolonged from 4.9 years under Des1 to 5.9 years under Des2. The expected study duration will likewise be prolonged from 3.3 years to 3.7 years under the alternative hypothesis, and from 4.5 years to 5.3 years under the null hypothesis. 44.2 Incorporating Drop-Outs 873 <<< Contents 44 44.3 * Index >>> Superiority Trials with Variable Follow-Up Incorporating NonConstant Accrual Rates In many clinical trials, the enrollment rate is low in the beginning and reaches its maximum expected level a few months later when all the sites enrolling patients have been recruited. Suppose that patients are expected to enroll at an average rate of 400/year for the first six months and at an average rate of 960/year thereafter. Click on icon on the bottom of your screen to go back to the input the window of Des2. Now in Accrual section, specify that there are two accrual periods and enter the accrual rate for each period in the dialog box as shown below. Once again let the sample size be 1660 to make Des3 comparable to the other two designs. Click on Compute to complete the design. Select all the three designs in the 874 44.3 Incorporating Non-Constant Accrual Rates <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Preview area and click on icon to see them side-by-side. Notice that the enrollment period has increased from 1.7 years to 2 years. Likewise, the maximum study duration and the expected study durations under H0 and H1 have also increased relative to Designs 1 and 2. Now the maximum study duration is 6.15 years. 44.4 Incorporating Piecewise Constant Hazards Prior studies had suggested that the survival curves might not follow an exponential distribution. Suppose it is believed that the hazard rate for failure on the control arm decreases after the first 12 months from 0.38 to 0.35. We will assume that the hazard ratio is still 0.83. We can enter the appropriate piecewise hazard rates into East as follows. Click on icon on the bottom of your screen to go back to 44.4 Incorporating Piecewise Constant Hazards 875 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up the input window and go to Test Parameters tab. Change the sample size to 1660 on Accrual/Dropouts tab for comparability with the previous designs. Click on Compute and see the results of the design in the Output Summary mode. We observe that the impact of changing from a constant hazard rate to a piecewise constant hazard rate is substantial. The maximum study duration has increased from 6.15 years for Des3 to 6.56 years for Des4. Before proceeding further, save all the four designs in the workbook. 876 44.4 Incorporating Piecewise Constant Hazards <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 44.5 Simulating a Trial with Proportional Hazards 44.5.1 Simulation Worksheet 44.5.2 Simulating Under H1 44.5.3 Simulating... It would be useful to verify the operating characteristics of the various designs created in the previous section by simulation. The new survival simulation capabilities in East permit this. Let us use these capabilities to simulate Des4. Save this design in the workbook. Right-click on this design node and select the menu item Simulate. You’ll see the following Survival Simulation worksheet. 44.5.1 Components of the Simulation Worksheet This simulation worksheet consists four tabs - Test Parameters, Response Generation, Accrual/Dropouts, and Simulation Controls. The Test Parameters tab displays all the parameters of the simulation. If desired, you may modify one or more 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet 877 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up of these parameter values before carrying out simulation. The second tab Response Generation will appear as shown below. In this tab, you may modify values of response parameters before carrying out simulation. The third tab Accrual/Dropouts will display information relating to accrual and dropouts. As in the case of other tabs, you may modify one or more values appearing in this tab before simulation is carried out. In the Simulation Controls, you may specify the simulation parameters like 878 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 number of simulations required and the desired simulation seed etc. Also optionally, you may bring out one more tab Randomization by clicking on Include Options button on the right hand top corner. In the Randomization, you may alter the allocation ratio of the design before carrying out simulation. The other tabs under the Include Options will be discussed elsewhere in this manual. Keeping all the default parameter values same as in the different tabs, click Simulate. You can see the progress of the simulation process summarized as shown in the following screen shot. The complete summary of simulations will be displayed 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet 879 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up at the end of simulations. Close this window. The simulation results appear in a row in the Output Preview as shown below. The output summary can be seen by clicking on the icon 880 after selecting the 44.5 Simulating a Trial with Prop.Hazards – 44.5.1 Simulation Worksheet <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulation row in the Output Preview. Now save the simulation results to the workbook by selecting the simulation results . On this newly added workbook node for simulation, row and then clicking on right-click and select Details. You will see the complete details simulation 44.5 Simulating a Trial with Prop.Hazards – 44.5.2 Simulation Worksheet 881 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up appearing on the output pane. The core part is shown below. 44.5.2 Simulating Under H1 Notice that in the above simulations, we did not change anything on the Response Generation tab which indicates that we executed 10000 simulations under the designs assumptions or in other words, under alternative hypothesis. Let us examine these 10000 simulations more closely. The actual values may differ from the manual, depending on the starting seed used. The column labeled Events in the second table, displays the number of events after which each interim look was taken. The column labeled Avg. Look Time in the first table, displays the average calendar times at which each interim look was taken. Thus, the first interim look (taken after observing 207 events) occurred after an average elapse of about 1.5 years; the second interim look (taken after observing 414 events) occurred after an average elapse of about 2.1 years; and so on. The remaining columns of the simulation output are self-explanatory. The columns labeled Stopping For show that 8966 of the 10000 simulations crossed the lower stopping boundary, thus confirming (up to Monte Carlo accuracy) that this design has 90% power. The detailed output tables also show how the events, drop-outs, accruals, and average follow-up times were observed at each interim analysis. 882 44.5 Simulating a Trial with Prop.Hazards – 44.5.3 Simulating... <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 44.5.3 Simulating Under H0 To simulate under the null hypothesis we must go back to the input window of Sim1 and then to the Response Generation tab. In this pane change the hazard rate for the treatment arm to 0.38 for the first piece and to 0.35 for the second piece of the hazard function. This change implies that we will be simulating under the null hypothesis. Click on the Simulate button. A new row in Output Preview will be added now. Select this row and add to the library node. By double-clicking on this node, you will see the detailed simulation output as shown below. The results are displayed below. Out of 10000 simulated trials only 27 crossed the upper stopping boundary and 25 crossed the lower stopping boundary thus confirming (up to Monte Carlo accuracy) that the type-1 error is preserved for this design. 44.5 Simulating a Trial with Prop.Hazards – 44.6.3 Simulating... 883 <<< Contents 44 44.6 * Index >>> Superiority Trials with Variable Follow-Up Simulating a Trial with NonProportional Hazards 44.6.1 Single-Look Design 44.6.2 Single-Look Design 44.6.3 Group Seq. Design A new agent is to be tested against placebo in a large cardiovascular study with the endpoint being time to stroke, MI or death. The control arm has a 12-month event-free rate of 97%. We wish to design the study to detect a hazard ratio of 0.75 with 90% power, using a two-sided test conducted at the 0.05 level. An important design consideration is that treatment differences are expected to emerge only after one year of therapy. Subjects will enroll at the rate of 1000/month and be followed to the end of the study. The dropout rate is expected to be 10% per year for both treatment arms. Finally, the study should be designed for maximum study duration of 50 months. The usual design options in East are not directly applicable to this trial because they require the hazard ratio to be constant under the alternative hypothesis. Here, however, we are required to power the trial to detect a hazard ratio of 0.75 that only emerges after patients have been on the study for 12 months. The simulation capabilities of East can help us with the design. 44.6.1 Single-Look Design with Proportional Hazards We begin by creating a single-look design powered to detect hazard ratio of 0.75, ignoring the fact that the two survival curves separate out only after 12 months. Open a new survival design worksheet by clicking on Design>Survival>Logrank Test Given Accrual Duration and Accrual Rates. In the resulting Test Parameters tab, enter the parameters values as shown below. Click on the tab Accrual/Dropouts and enter the values as shown below, 884 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 excluding the Accrual tab. East informs you in the Accrual tab, that any sample size in the range 2524 to 22260 will suffice to attain the desired 90% power. However, the study will end sooner if we enroll more patients. Recall that we wish the trial to last no more than 50 months, inclusive of accrual and follow-up. The Accrual-Duration chart can provide guidance on sample size selection. This chart reveals that if 6400 subjects are enrolled, the expected maximum duration of a trial is close to 50 months. 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design 885 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up Now change the Comtd. number of subjects to 6400 and click on Compute to complete the design. A new row is added for this design in the Output Preview. Select this row and add it to a library node under a workbook. Now you double-click on this node, you will see the detailed output. A section of it is shown below: We can verify the operating characteristics of Des1 by simulation. With the cursor on Des1 node, Click on Simulation icon from the library menu bar. You’ll be taken to the survival simulation worksheet. In the Simulation Control tab, specify the number of simulations to be 1000. Now click on Simulate button. This will generate 1000 simulations from the survival curves specified in the design. Each simulation will consist of survival data on 6400 subjects entering the trial uniformly at the rate of 1000/month. Events (failures) will be tracked and the simulated trial will be terminated when the total number of events equals 508. Subjects surviving past this termination time point will have their survival times censored. The resulting survival data will be summarized in terms of the logrank test statistic. Each simulation records two important quantities: the calendar time at which the last of the specified 508 events arrived; whether or not the logrank test statistic rejected the null hypothesis. We would expect that, on average, the 508 events will occur in about 48.7 months and about 90% of the simulations will reject the null hypothesis. The simulation summary 886 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is shown in the following screen shot. Indeed we observe that the average study duration for this set of 1000 simulations was 48.691 months, and that 913 of the 1000 simulated trials crossed the critical value and rejected H0 and hence the power attained is 0.913. This serves as an independent verification of the operating characteristics of Des1, up to Monte Carlo accuracy. 44.6.2 Single-Look Design with Non-Proportional Hazards Were it not for the fact that the hazard ratio of 0.75 only emerges after 12 months of therapy, Des1 would meet the goals of this study. However, the impact of the late separation of the survival curves must be taken into consideration. This is accomplished, once again, by simulation. Click the Edit Simulation icon while the cursor is on the last simulation node. In the resulting simulation sheet click on Response Generation tab. In this tab, specify that the hazard rates for the control and treatment arms are identical and equal to 0.0025 for the first 12 months and the hazard ratio is 0.75 thereafter. This is done by making appropriate entries in this 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 887 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up tab as shown below. Click on the Simulate button. This will generate 10000 simulations from survival curves specified in the Survival Parameters Pane. As before, each simulation will consist of survival data on 6400 subjects entering the trial uniformly at the rate of 1000/month. Events (failures) will be tracked and the simulated trial will be terminated when the total number of events equals 508. The summary output of this simulation 888 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 run as shown below. This time only 522 of the 1000 trials were able to reject H0 .The drop in power is of course due to the fact that the two survival curves do not separate out until 12 months have elapsed. Thus events that arise within the first 12 months arrive at the same rate for both arms and are not very informative about treatment differences. We need to increase the power of the study to 90%. This can be accomplished in one of two ways: 1. Prolonging the study duration until a sufficient number of events are obtained to achieve 90% power. 2. Increasing the sample size. The first approach cannot be used because the study duration is not permitted to exceed 50 months. The simulations have shown that the study duration is already almost 50 months, and it has only achieved 56.5% power. Thus we must resort to increasing the sample size. 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 889 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up Now if we increase the sample size while keeping the total number of events fixed at 508, the average study duration will drop. The power, however, may not increase. In fact it might even decrease since a larger fraction of the 508 events will arise in the first 12 months, before the two survival curves have separated. To see this, increase the sample size from 6400 to 10000 in the Accrual/Dropouts tab. Then click on Simulate button. From this simulation run, you will get the output summary as shown below. Notice that the average study duration has dropped to 29.7 months. But the power has dropped also. This time only 261 of the 10000 simulations could reject the null hypothesis. To increase power we must increase sample size while keeping the study duration fixed at about 50 months. This is accomplished by selecting the Look Time option from the drop-down box in the Fix at Each Look section of the Survival Parameters Pane and choosing a 50 month Total Study Durn., while keeping the 890 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 sample size increase from 6400 to 10000. We will now run 10000 simulations in each of which 10000 subjects are enrolled at the rate of 1000/year. Each simulated trial will be terminated at the end of 50 months of calendar time and a logrank test statistic will be derived from the data. Click on the Simulate button. Add the simulation run output to library node and see the 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 891 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up following output summary. 892 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For more details, you can click icon after selecting the saved simulation node. Now you can see, the power of the study has increased to 73.5%. On average 811 events occurred during the 50 months that the study remained open. Since we require 90% power, the sample size must be increased even further. This can be done by trial and error over several simulation experiments. Eventually we discover that a sample size of 18000 patients will provide about 90% power with an average of 1358 events. It is evident from these simulations that the proportional hazards assumption is simply not appropriate if the survival curves separate out late. In the present example the 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.2 Single-Look Design 893 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up proportional hazards assumption would have led to a sample size of 6400 whereas the sample size actually needed was 18000. 44.6.3 Group Sequential Design with Non-Proportional Hazards The single-look design discussed in the previous section required a sample size of 17200 subjects. A group sequential design, monitored by an independent data monitoring committee, is usually more efficient for large studies of this type. Such a trial can be designed with efficacy stopping boundaries or with efficacy and futility stopping boundaries. Consider first a design with five equally spaced efficacy boundaries. Go back to the library, click on Des1 node, and then click on . In the resulting design input dialog window, change the entry in the Number of Looks cell from 1 to 5. Click on Compute button and save the plan as Des2 in the library. Select Des1 and Des2 nodes and then click on 894 to see the following 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details for both the designs. Des2 reveals that a group sequential design, with five equally spaced looks, taken after observing 104, 208, 312, 416 and 520 events, respectively, utilizing the default Lan-DeMets-O’Brien-Fleming (LD(OF)) spending function, achieves 90% power with a maximum sample size of 12555 and a maximum study duration of 27.232 months. The expected study duration under H1 is 21.451 months. However, these operating characteristics are based on the assumption that the hazard ratio is constant and equals 0.75. Since in fact the hazard ratio is 0.75 only after 12 months of treatment, the actual power of this design is unlikely to be 90%. We can use simulation to determine the actual power. With the cursor in any cell of Des2 node, select 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design 895 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up from the menu bar. You will be taken to the simulation worksheet. In the Response Generation tab, make the changes in the hazard rates as shown below. After changing the number of simulations as 1000 in the Simulation Control, click on the Simulate button to run 1000 simulations of Des2 with data being generated from the survival distributions that were specified in the Response Generation tab. 896 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The results of this simulation run are as shown below. Only 187 of the 1000 simulated trials were able to reject the null hypothesis indicating that the study is grossly underpowered. We can improve on this performance by extending the total study duration so that additional events may be observed. To increase study duration, go to the Simulation Parameters tab and select the Look Time option under Fix at Each Look. We had specified at the outset that the total study duration should not exceed 50 months. Let us therefore fix the total study duration at 50 months and space each interim look 10 months apart by editing 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design 897 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up the Study Duration. We are now ready to simulate a 5-look group sequential trial in which the LD(OF) stopping boundaries are applied and the looks are spaced 10 months apart. Each simulated trial will enroll 12555 subjects at the rate of 1000/month. The simulation data will be generated from survival distributions in which the hazard rates of both arms are 0.0025 for the first 12 months and the hazard ratio is 0.75 thereafter. To generate 1000 simulations of this design click on the Simulate button. These simulations do indeed show a substantial increase in power, from 18.7% previously to 79.9% . 898 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The design specifications stated, however, that the trial should have 90% power. In order to achieve this amount of power we will have to increase the sample size. By trial and error, upon increasing the sample size to 18200 on the Simulation Parameters tab we observe that the power has increased to 90 % (up to Monte Carlo accuracy). 44.6 Simulating a Trial with Non-Prop.Hazards – 44.6.3 Group Seq. Design 899 <<< Contents 44 44.7 * Index >>> Superiority Trials with Variable Follow-Up Simulating a Trial with Stratification variables The data presented in Appendix I of Kalbfleisch and Prentice (1980) on lung cancer patients were used as a basis for this example. We will design a trial to compare two treatments (Standard and Test) in a target patient group where patients had some prior therapy. The response variable is the survival time in days of lung cancer patients. First, we will create a design for 3 looks, to compare the two treatment groups. Next, using this design, we will carry out simulation with stratification variables. Three covariates in the data are used here as stratum variables: a) type of cancer cell (small, adeno, large, squamous,), b) age in years (<= 50, > 50), and c) performance status score (<= 50, > 50 and <= 70, > 70). The input data for base design are as follows: Trial type:superiority; test type:2-sided; type I error:0.05; power:0.90; allocation ratio:1; hazard rate (control):0.009211; hazard rate (treatment):0.004114; number of looks:3; Boundary family:spending functions; spending function:Lan-DeMets (OF); subjects are followed:until end of study; subjects accrual rate:12 per day. The input data for stratified simulation are as given below: The number of stratum variables=3 (cell type; age group; performance status score). Table 44.1: Input data for stratified simulation 44.7.1 Cell type small adeno large squamous Proportion 0.28 0.13 0.25 0.34 Hazard ratio Baseline 2.127 0.528 0.413 Age group ≤ 50 years > 50 years Proportion 0.28 0.72 Hazard ratio Baseline 0.438 Performance status score group ≤ 50 > 50 and ≤ 70 > 70 Proportion 0.43 0.37 0.20 Hazard ratio Baseline 0.164 0.159 Creating the design First we will create a design using the input data. Open East, click Design tab and then Two Samples button in Survival group. Now click Logrank Test: Given Accrual Duration and Accrual Rates. In the resulting screen, enter the input data in the dialog 900 44.7 Simulating a trial with stratification – 44.7.1 Creating the design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 boxes under the different tabs. Finally click on Compute button. Now the dialog boxes under the different tabs will appear as shown below. The Test Parameters tab is shown below, where you can see the computed value of No.of Events. The Boundary will appear as shown below, where all the input data are seen. 44.7 Simulating a trial with stratification – 44.7.1 Creating the design 901 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up The Accrual/Dropouts tab containing the input data will be as shown below. After the design is completed and saved in a workbook, select the design node and 902 44.7 Simulating a trial with stratification – 44.7.1 Creating the design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 click on the output summary icon to see the following output display. 44.7.2 Running Stratified Simulation After selecting the design node, click on Simulate icon. You will see simulation screen with the dialog boxes under different tabs. Click on Include Options and select Stratification. 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation 903 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up The dialog box under Test Parameters will be as shown below. Keep the default test statistic LogRank and the default choice of Use Stratified Statistic. After entering the stratification input information, the dialog box under Stratification will appear as shown below. After entering adding response related input information, the dialog box under 904 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Response Generation will display details as shown in the following screen shots. 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation 905 <<< Contents 44 * Index >>> Superiority Trials with Variable Follow-Up The Accrual/Dropout dialog box will appear as shown below. In the Simulation Control tab, specify number of simulations as 1000 and select the choices under output options to save simulation data. The dialog box will appear as shown below. After clicking on Simulate button, the results will appear in the Output Preview row. Click on it and save it in the workbook. Select this simulation node and click on 906 44.7 Simulating a trial with stratification – 44.7.2 Running Stratified Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Summary icon to see the following stratification simulation output summary. The stratified simulation results show that the attained power 0.856 is slightly less than the design specified power of 0.90. 44.7 Simulating a trial with stratification 907 <<< Contents * Index >>> 45 Superiority Trials with Fixed Follow-Up This chapter will illustrate through a worked example how to design, monitor and simulate a two-sample superiority trial with a time-to-event endpoint in which each subject who has not dropped out or experienced the event is followed for a fixed duration only. This implies that each subject who does not drop-out or experience the event within a given time interval, as measured from the time of randomization, will be administratively censored at the end of that interval. In East we refer to such designs as fixed follow-up designs. 45.1 Clinical Trial of Drug Eluting Stents Drug-eluting coronary-artery stents were shown to decrease the risks of death from cardiac causes, myocardial infarction and target-vessel revascularization as compared to uncoated stents in patients undergoing primary percutaneous coronary intervention (PCI) in two randomized clinical trials published in the September 14, 2006 issue of the New England Journal of Medicine. In the Paclitaxel-Eluting Stent versus Conventional Stent in Myocardial Infarction with ST-Segment Elevation (PASSION) trial, Laarman et al. (2006) randomly assigned 619 patients to receive either a paclitaxel-eluting stent or an uncoated stent. The primary endpoint was the percentage of cardiac deaths, recurrent myocardial infarctions or target-lesion revascularizations at 12 months. A marginally lower 12-month failure rate was observed in the paclitaxel-stent group compared with the uncoated-stent group (8.8% versus 12.8%, p = 0.09). The Trial to Assess the Use of the Cypher Stent in Acute Myocardial Infarction Treated with Balloon Angioplasty (TYPHOON), (Spaulding et al., 2006) showed even more promising results. In this trial of 712 patients the sirolimus-eluting stents had a significantly lower target-vessel failure rate at 12 months than the uncoated stents (7.3% versus to 14.3%, p = 0.004). Based on these results an editorial by Van de Werf (2006) appeared in the same issue of the New England Journal of Medicine as the Typhoon and PASSION trials, recommending that studies with a larger sample size and a hard clinical endpoint be conducted so that drug-eluting stents might be routinely implanted in patients undergoing PCI. In this chapter we will use East to design and monitor a possible successor to the PASSION trial using a time-to-event endpoint with one year of fixed follow-up for each subject. 45.2 Single-Look Design The primary endpoint for the trial is the time to target-vessel failure, with a failure being defined as target-vessel related death, recurrent myocardial infarction, or target-vessel revascularization. Each subject will be followed for 12 months. Based on the PASSION data we expect that 87.2% of subjects randomized to the uncoated stents will be event-free at 12 months. We will design the trial for 90% power to detect an increase to 91.2% in the paclitaxel-stents group, using a two-sided level-0.05 test. Enrollment is expected to be at the rate of 30 subjects per month. 45.2.1 Initial Design 908 45.2 Single-Look Design – 45.2.1 Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 45.2.1 Initial Design We begin by opening a new East Workbook and selecting Logrank Test Given Accrual Duration and Accrual Rates. This will open the input window for the design as shown below. Select 2-Sided for Test Type, and enter 0.05 for Type I error. The right hand side panel of this input window is to be used for entering the relevant time-to event information. The default values in the above dialog box must be changed to reflect the time-to-event parameters specified for the design. Select % Cumulative Survival for the Input 45.2 Single-Look Design – 45.2.1 Initial Design 909 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up Method and enter the relevant 12-month event-free percentages. Change the Input Method to Hazard Rates. You will see the information you entered converted as shown below. Note that you may need to change the decimal display options for hazard rates using the decimal places. icon to see these numbers with more Another parameter to be decided is the Variance which specifies whether the calculation of the required number of events is to be based on the variance estimate of log hazard ratio under the null hypothesis or the alternative hypothesis. The default choice in East is Null. Most textbooks recommend this choice as well (see, for example Collett, 1994, equation (2.21) specialized to no ties). It will usually not be necessary to change this default. For a technical discussion of this issue refer to 910 45.2 Single-Look Design – 45.2.1 Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Appendix B, Section B.5.3 The second tab, labeled Accrual/Dropouts is used to enter the patient accrual rate and, for fixed follow-up designs, the duration of patient follow-up and the dropout information. In this example the clinical endpoint is progression-free survival for 12 months. Patients who are still on study at month 12 and who have not experienced the endpoint will be treated as censored. Therefore, in the first panel out of two, we select the entry from the dropdown that indicates that subjects are followed For Fixed Period and enter the number 12 in the corresponding edit box. Suppose that the anticipated rate of enrollment is 30 patients per month. This number is also entered into the dialog box as shown below. Let the committed accrual of subjects be same as 2474. The second panel, labeled Piecewise Constant Dropout Rates, is used to enter the rate at which we expect patients to drop out of the study. For the present we will assume that there are no drop-outs. 45.2 Single-Look Design – 45.2.1 Initial Design 911 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up An initial design, titled Des1, is created in the Output Preview pane upon clicking the Compute button. Click on icon to save the design in a workbook or on icon to see the output summary of this design. East reveals that 268 events are required in order to obtain 90% power. If each patient can only be followed for a maximum of 12 months, we must commit to enrolling a total of 2474 patients over a period of 82.5 months. With this commitment we expect to see the required 268 events within 12 months of the last patient being enrolled. So the total study duration is expected to be 82.5 + 12 = 94.5 months. To see how the 912 45.2 Single-Look Design – 45.2.1 Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 events are expected to arrive over time, invoke a plot of Sample Size/ Events vs. Time by clicking the Plots icon from the toolbar. Uncheck the Sample Size box, to see the events graphs on a larger scale as shown below. 45.3 Shortening the Study Duration 45.3.1 Increasing the Sample Size 45.3.2 Patient Follow-Up 45.3.3 Increasing the Rate of Enrollment Under Des1 the trial will last for 94.5 months, with 82.5 months of patient enrollment (i.e., a sample size of 2474 subjects). This is not considered to be satisfactory to the trial sponsor. There are three possible ways in which the study duration might be shortened; by increasing the sample size, by increasing the duration of patient follow-up, or by increasing the rate of patient enrollment. 45.3.1 Increasing the Sample Size 45.3 Shortening the Study Duration – 45.3.1 Increasing the Sample Size 913 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up Unlike trials with variable patient follow-up, in a fixed follow-up design the gain from increasing the sample size is limited. This is evident from the relatively narrow range between the minimum accrual duration (82.5 months) and the suggested maximum accrual duration (88.3 months). Notice that if we were to increase the enrollment duration to the say, 88.3 months, the total study duration would only decrease by 5.9 months; from 94.5 months to 88.6 months. To see this, edit Des1 and create Des2 and enter the number 88.267 into the cell for Committed Accrual (Duration) as shown below: Des2 is created in the Output Preview pane upon clicking the Compute button. Click on icon to save the design in a workbook. Select Des1 and Des2 in the workbook and click on 914 icon to see the side-by-side comparison of the two 45.3 Shortening the Study Duration – 45.3.1 Increasing the Sample Size <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 designs. The calculation of the minimum and maximum of the range of accrual durations is discussed on page 2308 of Appendix B, section B.5.2. East has determined that if the enrollment (at the pre-specified rate of 30 patients per month) is stopped before 82.467 months have elapsed, and every patient still on study is followed for precisely 12 months, we will obtain fewer than 268 events on average, and the trial will be underpowered. Therefore East specifies that the minimum duration of enrollment must be 82.467 months. The user has the option to increase the enrollment duration beyond 82.467 months. In that case, however, if all patients still on study are followed for 12 months, more than 268 events will accumulate, on average, by the time the trial is terminated. Therefore it will not be necessary to follow the later enrollees for the entire 12 month period. In the extreme case, if we extend the enrollment duration to 88.267 months, the required 268 events will arrive, on average, by the end of the enrollment period itself thus making it unnecessary to have any follow-up after the last patient has been enrolled. The study duration cannot be shortened any further by increasing the enrollment beyond 88.267 months (i.e., extending the sample size beyond 2648). For these reasons, for fixed follow-up 45.3 Shortening the Study Duration – 45.3.1 Increasing the Sample Size 915 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up designs, East selects the minimum of the range of enrollment durations (in this case 82.467) as the default enrollment duration. We note that in contrast, for variable follow-up designs, East selects the mid-point of the range of suggested enrollment durations as the default. Of course, the user is free to change the default enrollment duration for both types of designs. 45.3.2 Increasing the Length of Patient Follow-Up Since this is a fixed follow-up design we might consider increasing the length of patient follow-up, at present equal to 12 months. Edit Des1 by clicking the icon to create Des3. Increase the length of patient follow-up from 12 months to 18 months, and commit to the minimum sample size 1698. 916 45.3 Shortening the Study Duration – 45.3.2 Patient Follow-Up <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on Compute to get Des3 as shown below. By increasing the duration of patient follow-up to 18 months, the required 268 events can be obtained with fewer patients. It is now only necessary to have enrollment duration of 56.6 months. The study is expected to terminate 18 months after the last patient has enrolled for total study duration of 74.6. Increasing the length of patient follow-up has indeed shortened the total study duration. We note, however, that it might not always be feasible to increase patient follow-up in this manner, particularly if the clinical endpoint of interest determines how long one should wait for the endpoint to occur. 45.3 Shortening the Study Duration – 45.3.3 Increasing the Rate of Enrollment 917 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up 45.3.3 Increasing the Rate of Enrollment In cases where the primary endpoint determines the duration of the fixed follow-up, the option to shorten the study duration by increasing the follow-up duration is not available. In that case the only possibility is to increase the rate of enrollment by opening up more sites. Edit Des1 to create Des4 and increase the rate of enrollment from 30 patients/month to 45 patients/month, while committing to accruing the minimum number of subjects 2474. With this enrollment rate East calculates that an enrollment duration of 55 (sample size of 2474) and 12 additional months of follow-up will produce the desired 268 events on average. Thus the total study duration is expected to be 67 months. 918 45.3 Shortening the Study Duration – 45.3.3 Increasing the Rate of Enrollment <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now try an enrollment rate of 51.5 patients/month, and remember to maintain the 2474 accrual. At this enrollment rate the study is fully powered with a sample size of 2474 subjects, enrolled over a period of 48 months. The required 268 events will arrive on average 12 months after the last patient has enrolled so that the trial is expected to terminate at month 60. 45.3 Shortening the Study Duration – 45.3.3 Increasing the Rate of Enrollment 919 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up Click on Compute to see the design as shown below. 920 45.3 Shortening the Study Duration – 45.3.3 Increasing the Rate of Enrollment <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 45.3 Shortening the Study Duration – 45.3.3 Increasing the Rate of Enrollment 921 <<< Contents 45 45.4 * Index >>> Superiority Trials with Fixed Follow-Up Group Sequential Design 45.4.1 Incorporating Drop-Outs 45.4.2 Non-Const. Accr. Rates 45.4.3 Piece-wise Exp. Survival Edit Des5 and change the number of looks from 1 to 5, equally spaced, with the default LD(OF) spending function. This will create Des6. Click on Boundary tab to choose the boundary family and alpha spending function shown below: Change the committed accrual to the minimum: 2531. Click on Compute button to 922 45.4 Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 get the design shown below. We note that the 5-look design requires an up-front commitment of 274 events compared to the 268 events for the single-look design. At an enrollment rate of 51.5 subjects/month we need to enroll 2531 subjects over 49.1 months. The maximum study duration is expected to be 61.1 months, only 1.1 months longer than the single-look design. However, because of the possibility of early stopping the expected study duration, under the alternative hypothesis that the negative of the log hazard ratio is ln(0.673) = −0.397, is only 43.9 months a savings of more than 16 months. 45.4.1 Incorporating Drop-Outs The sample size will have to be increased appropriately if we expect drop-outs. Suppose we expect a drop out rate of 0.05 by 12 months for each treatment arm. Edit Des6 and enter the drop-out rates in the appropriate design dialog box as shown below. 45.4 Group Sequential Design – 45.4.1 Incorporating Drop-Outs 923 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up Change the committed accrual to the minimum: 2595. Click on Compute. Now you will get Des7 as shown below. The 5% drop-out rate has resulted in a sample size increase from 2531 subjects to 2595 924 45.4 Group Sequential Design – 45.4.1 Incorporating Drop-Outs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 subjects. However, the impact on maximum study duration and expected study duration is small. Under the alternative hypothesis the study is expected to last for 44.7 months in Des7 as compared to 43.9 months in Des6. 45.4.2 Incorporating Non-Constant Accrual Rates Des7 was designed with the assumption that patients would be enrolled at the rate of 51.5/month. Suppose that this enrollment rate cannot be achieved from the get-go. Instead, assume that for the first 12 months patients are enrolled at an average rate of 25/month and thereafter the average enrollment rate is 51.5/month. To see the impact of this change on the study design, edit Des7, enter the two enrollment rates into the appropriate dialog box as shown below and create Des8. 45.4 Group Sequential Design – 45.4.2 Non-Const. Accr. Rates 925 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up Click on Compute button to create Des8 as shown below. The total sample size has not changed between Des7 and Des8. However, the total duration of the enrollment phase has increased by about six months. Moreover, because of the slower enrollment rate for the first 12 months, the maximum total study duration has increased from 62.4 months to 68.6 months and the expected study duration under the alternative hypothesis has increased from 44.7 months to 50.9 months. 45.4.3 Incorporating Piece-Wise Exponential Survival Suppose that the mechanism of action of the stents is such that the hazard rate for failure decreases after the first six months. We will assume that the average hazard rate for the uncoated stents arm is 0.0114 for the first six months and decreases thereafter to an average rate of 0.0075. We will continue to assume that the hazard ratio is unchanged, at 0.673. Therefore the hazard rate for the coated stents arm decreases from 0.673 ∗ 0.0114 = 0.0077 to 0.673 ∗ 0.0075 = 0.005. Edit Des8 as shown below 926 45.4 Group Sequential Design – 45.4.3 Piece-wise Exp. Survival <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 to create Des9. Change the committed accrual to the minimum: 3095, and click Compute. Now you will get the edited Des9 as shown below. Since the hazard ratio is unchanged, Des8 and Des9 require the same number of events, 274, in order to achieve the desired 90% power. Observe, however, that in order for these 274 events to arrive on average 12 months after the last patient has enrolled, Des9 45.4 Group Sequential Design – 45.4.3 Piece-wise Exp. Survival 927 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up requires a sample size of 3095 subjects; 500 patients more than were required under Des8. The study duration is likewise prolonged. This is because the hazard rate slows down after the first six months on study. If, for example, the change in hazard rate were to occur after 12 months instead of after 6 months, the change would have no impact on sample size or study duration. To verify this, make the following change in Des9: Change the committed accrual to the minimum: 2595, and click on Compute. You will see Des10 details as shown below. 928 45.4 Group Sequential Design – 45.4.3 Piece-wise Exp. Survival <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Notice that Des8 and Des10 are now identical. 45.5 Verification by Simulation Click on the Des10 node in the Library click on from the toolbar. A simulation input window comprising of four tabs - Test Parameters, Response Generation, Accrual/Dropouts and Simulation Controls is now invoked. We will run simulations under different assumptions about the manner in which the data are generated. 45.5.1 Simulation Under the Alternative Hypothesis We first run the simulations without making any changes to the default settings of the simulation input tabs. To see the default inputs for the simulations, click the tabs mentioned above. Change the number of simulations to be run as 1000 on the last tab, Simulation Control and click Simulate button to run the simulations. An entry for the simulation output gets added in the Output Preview pane. Save it in the workbook. Since we simulated Des10, a node named Sim1 will get associated with Des10. Double click on this node and see the detailed simulation output. 45.5 Verification by Simulation – 45.5.1 Simulation Under H1 929 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up Let us examine the results. Notice first that we have selected the option to fix the number of events for each look at their pre-planned values in the Look Information section. This can be seen in the Simulation Parameters tab. Upon examining the simulation results in detail, however, we observe that the actual number of events at the final look is slightly lower than the pre-planned number of 274. This is observed consistently. If you edit this simulation node and simulate this scenario again and again with different starting seeds, you will notice that the actual number of events at which the first four looks are taken match the corresponding pre-planned values, whereas there appears to be a systematic bias towards taking the fifth and final look with slightly fewer events than was pre-planned. As a result the trial is slightly underpowered. In practice, the slight loss of power due to the systematic decrease in the number of events at the final look relative to the pre-planned number is of very little consequence. It is instructive, however, to understand why it arises at all. The reason for the small amount of systematic bias is that the maximum follow-up time for each patient is 12 months. Thus, no further follow-up is possible once 12 months have elapsed after the last subject has enrolled. Since the duration of the enrollment period has been fixed at 56.5 months, the trial must be terminated at the latest in 56.5 + 12 = 68.5 months. Observe that the selection in the Simulation Parameters tab, that in the row titled Fixed at Each Look, the choice Total No. of Events has been selected from the drop down list. This means that East has been instructed to perform simulations in which each look is taken after a fixed number of events has been observed, as pre-specified in the design. Specifically, the looks should be taken after 55, 110, 164, 219 and 274 events have been observed. The trial should be terminated early if a boundary is crossed at one of the 930 45.5 Verification by Simulation – 45.5.1 Simulation Under H1 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 first four looks; otherwise it should continue until all 274 events have been obtained. We thus have two conflicting restrictions for the maximum study duration. The fixed follow-up design implies that the trial cannot proceed beyond month 68.5 whereas the Planned # of Events restriction implies that the trial cannot proceed beyond 274 events. East resolves the conflict by fixing the maximum study duration at the earlier of 68.5 months or the time at which 274 events have been observed. Thereby the average number of events at the fifth and final look becomes a random variable with an upper bound of 274 and an expected value that is slightly less than 274. To get around this bias one should specify in the Look Information section that we will Fix at Each Look the Look Time rather than the Total No. of Events. With this specification the looks will occur at fixed calendar times of 22.1, 32.3, 42.2, 52.4, and 68.5 months regardless of the number of events that have been obtained at these looks. Although the actual number of events obtained at each of these five looks are now random variables, the average number of events obtained in repeated simulations will be 55, 110, 164, 219 and 274, respectively, under the alternative hypothesis. The bias due to fixing the maximum number of events at 274 will no longer occur and the study will be fully powered. To see this, run the simulations 45.5 Verification by Simulation – 45.5.1 Simulation Under H1 931 <<< Contents 45 * Index >>> Superiority Trials with Fixed Follow-Up 10,000 times after fixing the Look Time rather than the # of Events The appropriate number of events is obtained at each look on average and the study is fully powered, up to Monte Carlo accuracy. 45.5.2 Simulation Under the Null Hypothesis It is important to verify by simulation that the type-1 error is preserved. Accordingly, edit the node Sim2 and switch to the Response Generation tab. We may now make changes to the design by editing the entries in the cells that are white in color. To simulate under the null hypothesis we must set the hazard rates of the Control and Treatment groups to be the same, as shown below: Then click on the Simulate button to generate 10000 simulated trials under the null 932 45.5 Verification by Simulation – 45.5.2 Simulation Under H0 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hypothesis. The type-1 error has been preserved, with an overall two-sided false positive rate less than 5%. The above simulations were run with fixed look times rather than with fixed numbers of events at each look. It is interesting to note that, for the same fixed look times, the average number of events at each look under the null hypothesis greatly exceeds the corresponding average number of events at each look under the alternative hypothesis. This is so because the events arrive faster when the treatment arm is no more effective than the control arm. 45.5 Verification by Simulation 933 <<< Contents * Index >>> Non-Inferiority Trials Given Accrual Duration and Accrual Rates 46 This chapter will illustrate through a worked example how to design, monitor and simulate a two-sample non-inferiority trial with a time-to-event trial endpoint, when the accrual duration and accrual rates are fixed. 46.1 Establishing the Non-Inferiority Margin The first step in designing a non-inferiority trial is to establish a suitable non-inferiority margin. This is typically done by performing a meta-analysis on past clinical trials of the active control versus placebo. Regulatory agencies then require the sponsor of the clinical trial to demonstrate that a fixed percentage of the active control effect (usually 50%) is retained by the new treatment. A further complication arises because the active control effect can only be estimated with error. We illustrate below with an example provided by reviewers at the FDA. Rothman et al. (2003) have discussed a clinical trial to establish the non-inferiority of the test drug Xeloda (treatment t) relative to the active control (treatment c) consisting of 5-fluorouracil with leucovarin (5FU+LV) for metastatic colorectal cancer. In order to establish a suitable non-inferiority margin for this trial it is necessary to first establish the effect of 5FU+LV relative to the reference therapy of 5FU alone (treatment p, here regarded as placebo). To establish this effect the FDA conducted a ten-study random effects meta-analysis (FDA Medical-Statistical review for Xeloda, NDA 20-896, April 2001) of randomized comparisons of 5-FU alone versus 5-FU+LV. Letting λt , λc and λp denote the constant hazard rates for the new treatment, the active control and the placebo, respectively, the FDA meta-analysis established that ln (λ\ p /λc ) = 0.234 with standard error se[ln (λ\ p /λc )] = 0.075 . Thus with 100γ% confidence the active control effect lies inside the interval [0.234 − 0.075Φ−1 ( 1+γ 1+γ ), 0.234 + 0.075Φ−1 ( )] 2 2 (46.1) The new study is required to demonstrate that some fraction (usually 50%) of the active control effect is retained. Rothman et al. (2003) state that the claim of non-inferiority for the new treatment relative to the active control can be demonstrated if the upper limit of a two-sided 100(1 − α)% confidence interval for ln(λt /λc ) is less than a pre-specified fraction of the lower limit of a two-sided 100γ% confidence interval for the active control effect established by the meta-analysis. This is known as 934 46.1 Establishing the Non-Inferiority Margin <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the “two confidence intervals procedure”. Specifically in order to claim non-inferiority in the current trial it is necessary to show that −1 \ \ ln (λ (1 − α/2)se[ln (λ t /λc ) + Φ t /λc )] 1 + γ −1 < (1 − f0 ){ln (λ\ ( )se[ln (λ\ p /λc ) − Φ p /λc )]} . 2 (46.2) We may re-write the non-inferiority condition (46.2) in terms of a one-sided Wald test of the form \ ln (λ t /λc ) − δ0 < Φ−1 (1 − α/2) , (46.3) \ se[ln (λt /λc )] where −1 1 + γ δ0 = (1 − f0 ){ln (λ\ ( )se[ln (λ\ p /λc ) − Φ p /λc )]} 2 is the non-inferiority margin. (46.4) The choice f0 = 1 implies that the entire active control effect must be retained in the new trial and amounts to running a superiority trial. At the other end of the spectrum, the choice f0 = 0 implies that none of the active control effect need be retained; i.e., the new treatment is only required to demonstrate effectiveness relative to placebo. The usual choice is f0 = 0.5, implying that the new treatment is required to retain at least 50% of the active control effect. The usual choice for α is α = 0.05. A conservative choice for the coefficient γ is γ = (1 − α) = 0.95. Rothman et al. (2003) refer to this method of establishing the non-inferiority margin as the “two 95 percent two-sided confidence interval procedure” or the “95-95 rule”. In general this approach leads to rather tight margins unless the active control effect is substantial. Rothman et al. (2003) have also proposed more lenient margins that vary with the amount of power desired. Fleming (2007), however, argues for the stricter 95-95 rule on the grounds that it offers greater protection against an ineffective medical compound being approved in the event that the results of the previous trials used to establish the active control effect are of questionable relevance to the current setting. Accordingly we evaluate (46.4) \ with γ = 0.95, f0 = 0.5, ln (λ\ p /λc ) = 0.234 and se[ln (λp /λc )] = 0.075 thereby obtaining the non-inferiority margin to be δ0 = 0.044 for the log hazard ratio and exp(0.044) = 1.045 for the hazard ratio. 46.1 Establishing the Non-Inferiority Margin 935 <<< Contents * Index >>> 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates 46.2 Design of Metastatic Colorectal Cancer Trial 46.2.1 Single-Look Design 46.2.2 Early Stopping for Futility In this section we will use East to design a single-look non-inferiority trial comparing the test drug Xeloda (treament t) to the active control 5FU+LV (treatment c) for the treatment of metastatic colorectal cancer. On the basis of a meta-analysis of ten previous studies of the active control versus placebo (Rothman et al., 2003), a non-inferiority margin of 1.045 for λt /λc has been established. Thus we are interested in testing the null hypothesis of inferiority H0 : λt /λc ≥ 1.045 versus the one-sided alternative hypothesis that λt /λc < 1.045. Subjects are expected to enroll at the rate of 60/month and the median survival time for patients randomized to the active control arm is expected to be 18 months. 46.2.1 Single-Look Design We will use East to create an initial single-look design having 80% power to detect the alternative hypothesis H1 : λt /λc = 1 with a one sided level 0.025 non-inferiority test. To begin click Survival: Two Samples on the Design tab and then click Parallel Design: Log Rank Test Given Accrual Duration and Accrual Rates. A new screen will appear. Enter the appropriate design parameters into the dialog box as shown below. The box labeled Variance of Log Hazard Ratio specifies whether the calculation of the required number of events is to be based on the variance estimate of the log hazard ratio under the null hypothesis or the alternative hypothesis. The default choice in East is Null. Most textbooks recommend this choice as well (see, for example Collett, 936 46.2 Trial Design – 46.2.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1994, equation (2.21) specialized to no ties). It will usually not be necessary to change this default. For a technical discussion of this issue refer to Appendix B, Section B.5.3. Next click on the Accrual/Dropouts tab. Here we will specify the accrual information and dropout rates. Enter an accrual rate of 60. Suppose that there are 5% drop-outs per year in each arm. Enter these values as shown below. On the bottom of this screen is where you can specify the accrual duration or number of subjects. East automatically computes a range that is necessary to achieve the desired power of the study and selects the midpoint of the range, as the committed accrual duration or subjects. If your study has a restriction on accrual duration or subject accrual, you may enter this value in the Comtd. column. In our example, East computes a minimum accrual duration of 300.05 months and a suggested maximum of 323.4 months. Also, if you click the icon a chart which shows the relationship between accrual duration (or subject accrual, depending on whether you choose to 46.2 Trial Design – 46.2.1 Single-Look Design 937 <<< Contents * Index >>> 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates specify accrual duration or subject accrual) and study duration. Looking at this chart, choosing an accrual duration longer than 315 months will not result in a substantial decrease in study duration. Thus, we commit to an accrual duration of 315 months. Close this chart, select the radio button next to Duration and enter 315 in the Comtd. column. Click on Compute to complete the design. The design is shown as a row in the Output Preview located in the lower pane of this window. You can select this design by clicking anywhere along the row in the Output Preview. With Des1 selected, click the icon to display the details of this design in the upper pane, which are shown below. You may also wish to save this design. Select Des1 in the Output Preview 938 46.2 Trial Design – 46.2.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 window and click the to save this design to Workbook1 in the Library. It is immediately evident that Des1 is untenable. It requires 16,205 events to be fully powered. The problem lies with trying to power the trial to detect a hazard ratio of 1 under the alternative hypothesis. Suppose instead that the investigators actually believe that the treatment is slightly superior to the active control, but the difference is too small to be detected in a superiority trial. In that case a non-inferiority design powered at a hazard ratio less than 1 (0.95, say) would be a better option because such a trial would require fewer events. To see this create a new design by selecting Des1 in the Library, and clicking the icon on the Library toolbar. Then edit this design by specifying a hazard ratio of 0.95 46.2 Trial Design – 46.2.1 Single-Look Design 939 <<< Contents * Index >>> 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates under the alternative hypothesis as shown below. Next, click on the Accrual/Dropouts tab. Notice that the minimum and suggested maximum accrual have changed to 64.167 and 87.45 months, respectively. Click the icon to display the study duration versus accrual chart. Suppose that after examining this chart, you decide that an accrual duration longer than 77 months is not worth the small decrease in study duration one would gain from a longer accrual duration. Close this chart. Select the radio button next to Duration and 940 46.2 Trial Design – 46.2.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 enter 77 in the Comtd. column. Click the Compute button to generate output for Des2. With Des2 selected in the Output Preview, click the icon to save Des2 to the Library. In the Library, select the rows for Des1 and Des2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs side-by-side: 46.2 Trial Design – 46.2.1 Single-Look Design 941 <<< Contents * Index >>> 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates Des2 is clearly easier to implement than Des1. It requires only 3,457 events and 4620 subjects to be fully powered. Also note the marked decrease in study duration under either the null or alternative hypothesis. Nevertheless, Des2 is also unsatisfactory. The maximum study duration for Des2 (accrual plus follow-up) is 90.9 months with 77 months of that amount of time being utilized to enroll 4620 patients. It is necessary to shorten the maximum study duration further. One possible way to shorten the maximum study duration is to increase the rate of enrollment. Suppose that additional sites can be enlisted to enroll patients after the study is activated so that six months later the average rate of enrollment is increased to 110/month. To see the impact of the increased rate of enrollment select Des2 in the Library, and click on the on the Library toolbar. icon Next, click on the Accrual/Dropouts tab. Change the accrual rates as shown below. Notice how East automatically updates the accrual duration and subject accrual. An accrual duration in the range of 35 to 56.664 months is sufficient to achieve the desired power. Suppose that after examining the study duration versus accrual chart, we decide on an accrual duration of 49 months. Enter 49 in the Comtd. column. Click the Compute button to generate output for Des3. With Des3 selected in the Output Preview, click the icon to save Des3 to the Library. In the Library, select the rows for Des1, Des2, and Des3 by holding the Ctrl key, and then click the 942 46.2 Trial Design – 46.2.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon. The upper pane will display the details of the three designs side-by-side: Des3 also requires 3457 events. However, because of the faster rate of enrollment the time that it takes to obtain these events is cut down to 58.5 months. 46.2.2 Early Stopping for Futility Under the null hypothesis Des3, with 3457 events, has an expected study duration of 57.2 months. This is a very long time commitment for a trial that is unlikely to be successful. Therefore it would be a good idea to introduce a futility boundary for possible early stopping. Since we wish to be fairly aggressive about early stopping for futility we will generate the futility boundary from the Gamma(−1) β-spending function. On the other hand, since there is no interest in early stopping for efficacy, we will not use an efficacy boundary. Create a new design by selecting Des3 in the Library, and clicking the icon on the Library toolbar. Change the number of looks from 1 to 3. Next, click on the Boundary tab. Enter the parameters as shown below. Be sure to select the Non-Binding option. This choice gives us the flexibility to continue the trial even if a futility boundary has been crossed. Data monitoring committees usually want this 46.2 Trial Design – 46.2.2 Early Stopping for Futility 943 <<< Contents * Index >>> 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates flexibility; for example, to follow a secondary endpoint. Next click on the Accrual/Dropouts tab. Once again, East automatically computes the minimum and suggested maximum values for the accrual duration and subject accrual. Click the icon to display the study duration versus accrual chart. Notice that another line is added to the chart. Now, we can see the maximum study duration vs accrual under the null hypothesis. Suppose that after examining this chart, you decide to set the accrual duration at 49 944 46.2 Trial Design – 46.2.2 Early Stopping for Futility <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 months. Any increase in accrual duration past 49 months will not result in a substantial decrease in study duration. Close this chart. Select the radio button for Duration and enter 49 in the Comtd. column. Click the Compute button to generate output for Des4. With Des4 selected in the Output Preview, click the icon to save Des4 to the Library. In the Library, select the rows for Des3 and Des4 by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs side-by-side: Observe that while the maximum study duration has been inflated by about 6 months compared to Des3, the expected study duration under H0 has been cut down by almost 18 months. It would be useful to simulate Des4 under a variety of scenarios for the hazard ratio. Select Des4 in the Library and click the icon. You will be taken to the 46.2 Trial Design – 46.2.2 Early Stopping for Futility 945 <<< Contents * Index >>> 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates following simulation worksheet. We wish to simulate this trial under the null hypothesis that the hazard ratio is exp(0.044) = 1.045. To this end click on the Response Generation tab. In this tab change the hazard ratio to 1.045. Next, click the Simulate button to simulate 10000 trials. A new row labeled Sim1 will appear in the Output Preview window. Select Sim1 in the Output Preview and click the icon to save it to the Library. In the Library, double-click Sim1. A portion of the output is displayed below. (The actual values may differ, depending on the 946 46.2 Trial Design – 46.2.2 Early Stopping for Futility <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 starting seed used). Note that 234 out of the 10000 simulations rejected the null hypothesis when it was true. Thus confirming (up to Monte Carlo accuracy) that this design achieves a type-1 error of 2.5%. Also, observe that 50% of these trials have crossed the futility boundary at the very first interim look after only 29 months of study duration. 46.3 Interim Monitoring Suppose we have adopted Des4. Let us monitor the trial with the help of the Interim Monitoring Worksheet. Select Des4 in the Library, and click the icon from the Library toolbar. Alternatively, right-click on Des4 and select Interim Monitoring. The interim monitoring dashboard contains various controls for monitoring the trial, and is divided into two sections. The top section contains several columns for displaying output values based on the interim inputs. The bottom section contains four charts, each with a corresponding table to its right. These charts provide graphical and numerical descriptions of the progress of the clinical trial and are useful tools for decision making by a data monitoring committee. Suppose that the first interim look is taken after observing 1300 events. The observed hazard ratio is 1.15 and the standard error of the log hazard ratio is 0.06. Enter this information into the interim monitoring worksheet using Test Statistic calculator. Click 46.3 Interim Monitoring 947 <<< Contents * Index >>> 46 Non-Inferiority Trials Given Accrual Duration and Accrual Rates on and enter the data in the test statistic calculator as shown below. Next, click Recalc and then OK. East will indicate that the H1 (futility) boundary has been crossed and hence, the alternative hypothesis of non-inferiority is rejected in 948 46.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 favor of the null hypothesis of inferiority. Click the Stop button to terminate the trial. You will see the IM sheet output including Final Inference details as shown below. Observe that the upper 97.5% Naive confidence bound for δ, 0.257, is above the non-inferiority margin of 0.044 (on the log hazard ratio scale). Note - Click on icon to hide or unhide the columns of your interest. 46.3 Interim Monitoring 949 <<< Contents * Index >>> 47 Non-Inferiority Trials with Fixed Follow-Up This chapter will illustrate through a worked example how to design, monitor and simulate a two-sample non-inferiority trial with a time-to-event endpoint in which each subject who has not dropped out or experienced the event is followed for a fixed duration only. This implies that each subject who does not drop-out or experience the event within a given time interval, as measured from the time of randomization, will be administratively censored at the end of that interval. In East we refer to such designs as fixed follow-up designs. 47.1 Type II Diabetes Trial A randomized non-inferiority clinical trial of a new monotherapy agent (treatment ‘t’) versus an active control (treatment ‘c’) is being planned for the treatment of type II diabetes. The primary endpoint is time to treatment failure, as measured by an elevated level of the HbA1c biomarker (greater than 8%). Each patient will be followed for up to 18 months or failure, whichever comes first. It is estimated that 50% of subjects on the active control will fail within four years. A major issue for non-inferiority trials is the selection of the non-inferiority margin for the new therapy. Since this question was discussed at length in Chapter 46, we will not repeat the discussion here. (See also, Rothman et al., 2003). Instead we will assume that, on the basis of an appropriate meta-analysis, the claim of non-inferiority can be sustained by demonstrating statistically that the treatment arm is at most 10% more hazardous than the control arm. This establishes a non-inferiority margin of λt /λc = 1.1 for the hazard ratio. Patient accrual will be at the rate of 1000/month for the first six months and 1500/month thereafter. The annual drop-out rate is expected is expected to be 8% on each treatment arm. We will design this trial to test the null hypothesis, H0 : λt /λc ≥ 1.1, against the one sided alternative hypothesis, H1 : λt /λc < 1.1, with 90% power when λt /λc = 1. The investigators wish to select sample size that will enable the study to be completed within two years. 47.2 Single-Look Design We begin by creating a single-look design for this study. To begin click Survival: Two Samples on the Design tab and then click Parallel Design: Logrank Test Given Accrual Duration and Accrual Rates. This will open the input window for the design as shown below. Select Noninferiority from the Design Type dropdown. The right hand side panel of this input window is to be used for entering the relevant time-to event information. It appears with a default hazard ratio and default hazard rates for the control and treatment arms. Enter the survival information as mentioned in the design description. 950 47.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The hazard ratio under the null hypothesis (of non-inferiority) is 1.1. The hazard ratio under the alternative hypothesis at which 90% power is desired is 1. Before leaving this window we must enter the hazard rate for the Active Control (Baseline) arm. We know that the four-year failure rate for the active control arm is 50%. This information can be directly entered by choosing the input method as Cum % Survival as shown below: To see the conversion of this information into hazard rates, select as input method the Hazard Rates option. The cumulative % survival will be converted into hazard rates. Another parameter to be decided is the Variance which specifies whether the null or alternative hypothesis variance will be used to convert information into sample size. Leave it at its default value. (If interested in the technical details of the choice of 47.2 Single-Look Design 951 <<< Contents 47 * Index >>> Non-Inferiority Trials with Fixed Follow-Up variance, refer to Appendix B, Section B.5.3. The second tab, labeled Accrual/Dropout is used to enter the patient accrual rate and, for fixed follow-up designs, the duration of patient follow-up and the dropout information In this study, each subject will be followed for up to 18 months. Therefore select the For Fixed Period entry from the dropdown of Subjects are Followed and enter 18 in the edit box. Enrollment begins at the rate of 1000/month and increases to 1500/month six months later. Enter this information as shown below. The second panel, labeled Piecewise Constant Dropout Rates, is used to enter the rate at which we expect patients to drop out of the study. Make the # of Pieces as 1, change the Input Method to Dropout Rates and enter the information that the annual drop-out rate is 8%, as shown below. 952 47.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Also, make the Committed # of Subjects equal to the Min. 21446. Suggested, Click on Compute to complete the design. 47.2 Single-Look Design 953 <<< Contents 47 * Index >>> Non-Inferiority Trials with Fixed Follow-Up East reveals that any sample size between 21,446 and 34,080 will satisfy the 90% power requirement. With 21,446 patients enrolled the expected study duration is 34.3 months, consisting of 16.3 months during the enrollment phase and an additional 18 months of fixed follow-up for each patient - including the last one - to be enrolled. At the end of that 34.3 month period we expect 4627 events. This is the number of events needed to fully power the study. To see how the events arrive over time, click on the 954 47.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sample Size/Events vs. Time chart. If you increase the sample size beyond 21,446, the total study duration will be shortened. For example consider increasing the sample to 30,000 patients by editing 47.2 Single-Look Design 955 <<< Contents 47 * Index >>> Non-Inferiority Trials with Fixed Follow-Up Des1 ( icon) and creating Des2 with a new sample size. Now the total study duration is 25 months where the accrual phase alone lasts for 22 956 47.2 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 months. In this case, every patient will not have been followed for the full 18 months by the time the 4627 events needed to fully power the study have arrived and the study has been closed. Only those who were enrolled early will have been followed for 18 months. The later enrollees will have been followed for a shorter time. 47.2 Single-Look Design 957 <<< Contents 47 47.3 * Index >>> Non-Inferiority Trials with Fixed Follow-Up Three-Look Design Next we consider extending Des1 by permitting two equally spaced interim looks at the accruing data with a view to possible early stopping. Edit Des1, change the number of looks from 1 to 3 as shown below. Change the Committed # of Subjects to 21701 on accrual/Dropouts tab. Click the Compute button. 958 47.3 Three-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 47.3 Three-Look Design 959 <<< Contents 47 * Index >>> Non-Inferiority Trials with Fixed Follow-Up Because the default Lan-DeMets-O’Brien-Fleming spending function LD(OF) was used in this design, the maximum study duration has been inflated very slightly, from 34.3 to 34.5 months. However, if the alternative hypothesis is true we expect to terminate the trial in 26.4 months, a savings of about 8 months. This can be seen from the table Sample Size Information of the design details window. 960 47.3 Three-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 47.4 Three-Look Design with Superiority Alternative The preceding design required 21,701 subjects. This enormous up-front commitment might not be necessary if one actually believes that the new treatment is superior to the control treatment. Suppose that although the trial is still intended to reject the null hypothesis of inferiority at a non-inferiority margin of λt /λc = 1.1, it is believed that in fact λt /λc is less than 1; i.e., the treatment is actually superior to the active control. Ordinarily one would design a superiority trial in this situation. But now, suppose that the value λt /λc is believed to be about 0.95 under the alternative hypothesis. It would be very difficult to design a trial to prove superiority with this large a hazard ratio. (An extremely large sample size would be needed.) We can, however, use East to design a non-inferiority having 90% power at this alternative hypothesis. Edit Des3 and create Des4 by modifying the hazard ratio under the alternative hypothesis from 1 to 0.95 as shown below and Committed Sample Size equal to 9379. Click the Compute button to complete the design. 47.4 Three-Look Design with Superiority Alternative 961 <<< Contents 47 962 * Index >>> Non-Inferiority Trials with Fixed Follow-Up 47.4 Three-Look Design with Superiority Alternative <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Des4 can achieve 90% power with only 9379 patients, and 1979 events. The maximum study duration is 26.3 months and the expected study duration is 20 months under the alternative hypothesis. When compared to Des3, the savings in sample size are enormous. 47.5 Simulating a NonInferiority Trial Let us simulate Des4. Activate Des4 in the Library and click on will be taken to the Simulation Input window. icon. You To view the default simulation inputs for this design, navigate across the four tabs. The inputs are as follows: The hazard rates displayed in the Response Generation tab are the ones that were specified under the alternative hypothesis; i.e., λc = 0.0144 and λt = 0.0137. Hence we expect the trial to have 90% power. To verify this click the Simulate button and observe that in 10000 simulated trials the null hypothesis of inferiority was rejected 47.5 Simulating a Non-Inferiority Trial 963 <<< Contents 47 * Index >>> Non-Inferiority Trials with Fixed Follow-Up 8958 times. Also note that the Average Study Duration is 19.765 months. Next let us verify that this design also preserves the type-1 error. Edit the node Sim1 icon. We now specify the hazard rates under the null hypothesis; by clicking i.e., λc = 0.0144 and λt = 0.0144 ∗ 1.1 = 0.0159. We enter these hazard rates into the table labeled Piecewise Hazards as shown below. (Note - Consider taking the exact values of hazard rates with full precision to reproduce the results in this User Manual) 964 47.5 Simulating a Non-Inferiority Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Generate 10000 simulated trials by clicking on the Simulate button. We observe that only 225 of the 10000 trials rejected the null hypothesis thus confirming (up to Monte Carlo accuracy) that the type-1 error of 0.025 is preserved. 47.5 Simulating a Non-Inferiority Trial 965 <<< Contents * Index >>> 48 Superiority Trials Given Accrual Duration and Study Duration This chapter will illustrate through a worked example how to design and simulate a two-sample superiority trial with a time-to-event trial endpoint, where the accrual duration and study duration are constrained. Most trials in the pharmaceutical industry setting are designed in this manner, time being a more rigid constraint than the accrual rate of patients. The duration of a clinical trial impacts the duration of a drug development program, and thus time to market and potential revenues. Therefore it is of interest to fix the study duration as well as the accrual duration to finish the clinical trial according to schedule. The option to design a trial in this way is available in East. 48.1 Calculating a Sample Size For this design, East obtains the maximum number of events Dmax from the maximum information Imax , as described in Appendix sections B.5 and B.5.3. To calculate the sample size, we first equate the expected number of events d(Sa + Sf ) (as calculated in Appendix D which depends on the accrual duration (Sa ) and the duration of follow-up (Sf ) to the maximum number of events Dmax . d(Sa + Sf ) = Dmax (48.1) In this type of design the accrual duration Sa and the study duration Sa + Sf are given as input. East iterates between sample sizes, increasing onwards from a minimum value of Dmax , enrolled over a duration of Sa until Dmax events are found to occur within a study duration of Sa + Sf . The result is the unique sample size required to obtain the proper power for the study. 48.2 The RALES Clinical Trial: Initial Design The RALES trial (Pitt et. al., 1999) was a double blind study of aldosterone-receptor blocker spironolactone at a daily dose of 25 mg in combination with standard doses of an ACE inhibitor (treatment arm) versus standard therapy of an ACE inhibitor (control arm) in patients who had severe heart failure as a result of systolic left ventricular dysfunction. The primary endpoint was death from any cause. Six equally-spaced looks at the data using the Lan-DeMets-O’Brien-Fleming spending function were planned. The trial was designed to detect a hazard ratio of 0.83 with 90% power at a two-sided 0.05 level of significance. The hazard rate of the control arm was estimated to be 0.38. Randomization was scheduled to begin in March 1995 and complete in December 1996 for a total of 1.8 years of enrollment. Follow-up was planned through December 966 48.2 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1999, so that the total study duration from first patient enrolled to last patient visit should be 4.8 years. We begin by using East to design RALES under these basic assumptions. To begin click Survival: Two Samples on the Design tab and then click Parallel Design: Logrank Test Given Accrual Duration and Study Duration . A new screen will appear. Enter the appropriate design parameters into the dialog box as shown below. The box labeled Variance of Log Hazard Ratio specifies whether the calculation of the required number of events is to be based on the variance estimate of the log hazard ratio under the null hypothesis or the alternative hypothesis. The default choice in East is Null. Most textbooks recommend this choice as well (see, for example Collett, 1994, equation (2.21) specialized to no ties). It will usually not be necessary to change this default. For a technical discussion of this issue refer to Appendix B, Section B.5.3. Next, click on the Boundary Info tab. We will take six equally spaced looks at the data using the Lan-DeMets O’Brien-Fleming spending function. These are the default 48.2 The RALES Clinical Trial: Initial Design 967 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration setting in East. Note that we do not select a futility boundary in this case. Next click on the Accrual/Dropout Info tab. Here we will specify the accrual information and dropout rates. The software allows a specification of piecewise constant hazards and variable accrual rates but we start by looking at an example that does not require any of these options. In the drop-down menu next to Subjects are followed: select Until End of Study. Set the Accrual Duration to 1.8 years and the Study Duration to 4.8 years. Notice that East has changed the settings so that at 1.8 years the study should be 100% accrued. Keep the number of accrual periods equal to the default of 1. To the right of the Accrual Info box is the Piecewise Constant Dropout Rates box. This box is used to enter that rate at which we expect patients to drop out of the study. For the present we will assume that there are no drop-outs. Click on Compute to complete the design. The design is shown as a row in the 968 48.2 The RALES Clinical Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Preview located in the lower pane of this window. You can select this design by clicking anywhere along the row in the Output Preview. With Des1 selected, click the icon to display the details of this design in the upper pane, which are shown below. You may also wish to save this design. Select Des1 in the Output Preview window and click the to save this design to Workbook1 in the Library. East notifies you that 1243 events and a sample size of 1689 are required to attain the desired 90% power in the allotted time. East provides charts to examine the trade-offs between power and accrual duration, study duration, sample size or number of events. Select Des1 in the Library click the icon and select Power vs. Sample Size. 48.2 The RALES Clinical Trial: Initial Design 969 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration Switch the X-Axis to No. of Events. The power of the study is really tied to the number of events that are observed. This chart shows the direct relationship between power and number of events. Note that 950 events give us about 81% power. You may wish to save this chart to the Library by clicking on the Save in Workbook button. 48.3 Incorporating Drop-Outs The investigators expect 5% of the patients in the spironolactone group and the control group to drop out each year. Create a new design by selecting Des1 in the Library, and clicking the icon on the Library toolbar. Next, click on the Accrual/Dropout Info tab. In the Piecewise Constant Dropout Rates box, select 1 for the number of pieces and change the Input Method from Hazard Rates to Prob. of Dropout . Then enter 0.05 dropouts by 1 year for the treatment and 970 48.3 Incorporating Drop-Outs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 control arm as shown below. Click the Compute button to generate output for Des2. With Des2 selected in the Output Preview, click the icon to save Des2 to the Library. In the Library, select the rows for Des1 and Des2, by holding the Ctrl key, and then click the icon. The upper pane will display the details of the two designs side-by-side. A comparison of the two plans reveals that, because of the drop-outs, we require 1,824 subjects to be enrolled under Des2 rather than 1689 under Des1. Also, the expected study duration will not change much under the alternative and null hypotheses between Des1 and Des2. 48.4 Incorporating NonConstant Accrual Rates In many clinical trials the enrollment rate is low in the beginning and reaches its maximum expected level a few months later when all the sites enrolling patients are onboard. Suppose that 20% of the total accrual is expected to occur during the first six months with the rest happening during the remaining 1.3 years. Create a new design by selecting Des2 in the Library, and clicking the icon on the Library toolbar. Next, click on the Accrual/Dropout Info tab. Specify that there are two accrual periods and enter the cumulative accrual for each period in the dialog box as shown 48.4 Incorporating Non-Constant Accrual Rates 971 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration below. Click the Compute button to generate output for Des3. With Des3 selected in the Output Preview, click the icon to save Des3 to the Library. In the Library, select the rows for Des1, Des2, and Des3 by holding the Ctrl key, and then click the icon. The upper pane will display the details of the three designs side-by-side. Notice that we now need 1837 subjects to be enrolled to compensate for the overall later enrollment of subjects. 48.5 Simulation 48.5.1 Simulating Under H1 48.5.2 Simulating Under H0 It would be useful to verify the operating characteristics of the various plans created in the previous section by simulation. Select Des3 in the Library and click the icon. You will be taken to the following simulation worksheet. 48.5.1 Simulating Under H1 We will first simulate the trial under the alternative hypothesis H1 . In the Simulation Parameters tab select Total No. of Events to fix at each look - the default option. Select LogRank from the drop-down menu next to Test Statistic. Other options for a test statistic include the Wilcoxon-Gehan and Harrington-Fleming. Next, click the Simulate button to simulate 10000 trials. A new row labeled Sim1 will 972 48.5 Simulation – 48.5.1 Simulating Under H1 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 appear in the Output Preview window. Select that row and click the icon to save it to the Library. In the Library, double-click Sim1. A portion of the output is displayed below. (The actual values may differ, depending on the starting seed used). We will now run another 10000 simulations, this time fixing the calendar time of each look instead of fixing the number of events. Click the icon on the left bottom corner to go back to the input window of Sim1. In the Test Parameters tab select Look Time from the drop-down menu next to Fix at Each Look:. 48.5 Simulation – 48.5.1 Simulating Under H1 973 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration When the Look Time option is selected the locations of the interim looks at which stopping boundaries are computed are expressed in terms of the calendar time of each interim look instead of the number of events at each interim look. Next, click the Simulate button to simulate 10000 trials. A new row labeled Sim2 will appear in the Output Preview window. Select that row and click the icon to save it to the Library. In the Library, double-click Sim2. A portion of the output is displayed below. (The actual values may differ, depending on the starting seed used). 48.5.2 Simulating Under H0 To simulate under the null hypothesis we must go to the Response Generation Info tab in the simulation worksheet. In this tab change the hazard rate for the treatment arm to 0.38. This change implies that we will be simulating under the null hypothesis. Next, click 974 48.5 Simulation – 48.5.2 Simulating Under H0 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 on the Test Parameters tab and make sure that the Total No. of Events is fixed at each look. Next, click the Simulate button to simulate 10000 trials. A portion of the results are displayed below. Out of 10000 simulated trials 245 crossed the upper stopping boundary and 258 crossed the lower stopping boundary thus confirming (up to Monte Carlo accuracy) that the type-1 error is preserved for this design. 48.6 User Defined R Function East allows you to customize simulations by inserting user-defined R functions for one or more of the following tasks: generate response, compute test statistic, randomize subjects, generate arrival times, and generate dropout information. The R functionality for arrivals and dropouts will be available only if you have entered such information at the design stage. Although the R functions are also available for all normal and binomial endpoints, we will illustrate this functionality for a time-to-event endpoint. Specifically, we will use an R function to generate Weibull survival responses. Start East afresh. On the Design tab, click Survival: Two Samples and then Logrank Test Given Accrual Duration and Study Duration. Choose the design parameters as shown below. In particular, select a one sided test 48.6 User Defined R Function 975 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration with type-1 error of α = 0.025. Click Compute and save this design (Des1) to the Library. Right-click Des1 in the Library and click Simulate. In the Simulation Control Info tab, check the box for Suppress All Intermediate Output. Type 10000 for Number of Simulations and select Clock for Random Number Seed. In the top right-hand corner for the input window, click Include Options, and then click User Defined R Function. 976 48.6 User Defined R Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For now, leave the row Initialize R Environment blank. This optional task can be useful for loading required libraries, setting seeds for simulations, and initializing global variables. Select the row for Generate Response, click Browse..., and navigate to the folder containing your R file. 48.6 User Defined R Function 977 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration Select the file and click Open. The path should now be displayed under File Name. Click on the first row and then click View to open a notepad application to view your R file. In this example, I am generating survival responses for both control and treatment arms from a Weibull with shape parameter = 2 (i.e. exponential), with the same hazard rate in both arms. Copy the function name (in this case GenWeibull). 978 48.6 User Defined R Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Close the R file and paste the function name in the cell for Function Name. Click Simulate. Return to the tab for User Defined R Function, select the Generate Response row, and click View. In the R function, change the shape parameter = 1, to generate responses from a Weibull distribution with decreasing hazards. Save and close the R file. You may not be able to save the file in the C: drive due to administrative privileges. So save the updated file somewhere else, say the Desktop. Browse to the new file on the Desktop. The function name is same so no need to change that. Click Simulate. Select both simulations (Sim1 and Sim2) from the Output Preview, and on the 48.6 User Defined R Function 979 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration toolbar, click to display in the Output Summary. Notice that the type-1 error appears to be controlled in both cases. When we simulated from the exponential (Sim2), the average study duration (30.7 months) was close to what was calculated at Des1 for the expected study duration under the null. However, when we simulated from the Weibull with increasing hazards (Sim1), the average study duration increased to 34.6 months. Appendix O contains detailed specifications for the required inputs and outputs of R functions for each task and endpoint. The ability to use custom R functions for many simulation tasks allows considerable flexibility in performing sensitivity analyses and assessment of key operating characteristics. 48.7 Assurance for Survival Assurance, or probability of success, is a Bayesian version of power, which corresponds to the (unconditional) probability that the trial will yield a statistically significant result. Specifically, it is the prior expectation of the power, averaged over a prior distribution for the unknown treatment effect (see O’Hagan et al., 2005). For a given design, East allows you to specify a prior distribution, for which the assurance or probability of success will be computed. In this section, we will replicate and extend 980 48.7 Assurance for Survival <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 an example from Sabin et al. (2014). Start East afresh. Click Survival: Two Samples on the Design tab and then click Parallel Design: Logrank Test Given Accrual Duration and Study Duration. Compute the following design: Design Type = Superiority, Test Type = 1-sided, Type-1 error = 0.025, Power = 80%, Hazard ratio = 0.75. This design requires 380 events to achieve 80% power. However, this value of power depends on the assumption that the HR is precisely 0.75, or that δ = ln(HR) = -0.288. Sabin et al. (2014) explored various prior distributions derived from Phase 2 data. In one example, they used a Normal prior distribution for ln(HR), with a mean of −0.183, and a standard deviation of 0.135. Select the Assurance checkbox in the Input window. In the Distribution list, click Normal, and in the Input Method list, click E(δ) and SD(δ). 48.7 Assurance for Survival 981 <<< Contents 48 * Index >>> Superiority Trials Given Accrual Duration and Study Duration This replicates their reported Probability of Ph 3 success of 0.46. East also allows you to specify an arbitrary discrete prior distribution through an R function. In the Distribution list, click User Specified-R, and then click Browse... to select the R file where you have constructed a prior. Click View... to open the R file. In this R file, we have constructed a discretized Normal distribution with the same mean and standard deviation as above, but added a lump of equal weight at the null hypothesis. Type the function name (in this case, lnHR) into the R Function field, and click Compute. The resulting probability of success (0.241) is even lower due to the prior weight on the null hypothesis. 982 48.7 Assurance for Survival <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 48.7 Assurance for Survival 983 <<< Contents * Index >>> 49 Non Inferiority Trials Given Accrual Duration and Study Duration This chapter will illustrate through a worked example how to design and simulate a two-sample non inferiority trial with a time to event trial endpoint, when the accrual duration and study duration are fixed. 49.1 Calculating a Sample Size For this design, East obtains the maximum number of events Dmax from the maximum information Imax , as described in Appendix sections B.5 and B.5.3. To calculate the sample size, we first equate the expected number of events d(Sa + Sf ) (as calculated in Appendix D which depends on the accrual duration (Sa ) and the duration of follow-up (Sf ) to the maximum number of events Dmax . d(Sa + Sf ) = Dmax (49.1) In this type of design the accrual duration Sa and the study duration Sa + Sf are given as input. East iterates between sample sizes, increasing onwards from a minimum value of Dmax , enrolled over a duration of Sa until Dmax events are found to occur within a study duration of Sa + Sf . The result is the unique sample size required to obtain the proper power for the study. 49.2 The Non Inferiority Margin The first step in designing a non-inferiority trial is to establish a suitable non inferiority margin. This is typically done by performing a meta-analysis on past clinical trials of the active control versus placebo. Regulatory agencies then require the sponsor of the clinical trial to demonstrate that a fixed percentage of the active control effect (usually 50%) is retained by the new treatment. A further complication arises because the active control effect can only be estimated with error. We illustrate below with an example provided by reviewers at the FDA. Rothman et al. (2003) have discussed a clinical trial to establish the non inferiority of the test drug Xeloda (treatment t) relative to the active control (treatment c) consisting of 5 fluorouracil with leucovarin (5FU+LV) for metastatic colorectal cancer. In order to establish a suitable non inferiority margin for this trial it is necessary to first establish the effect of 5FU+LV relative to the reference therapy of 5FU alone (treatment p, here regarded as placebo). To establish this effect the FDA conducted a ten study random effects meta analysis (FDA Medical Statistical review for Xeloda, NDA 20 896, April 2001) of randomized comparisons of 5-FU alone versus 5-FU+LV. 984 49.2 The Non Inferiority Margin <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Letting λt , λc and λp denote the constant hazard rates for the new treatment, the active control and the placebo, respectively, the FDA meta analysis established that ln (λ\ p /λc ) = 0.234 with standard error se[ln (λ\ p /λc )] = 0.075 . Thus with 100γ% confidence the active control effect lies inside the interval [0.234 − 0.075Φ−1 ( 1+γ 1+γ ), 0.234 + 0.075Φ−1 ( )] 2 2 (49.2) The new study is required to demonstrate that some fraction (usually 50%) of the active control effect is retained. Rothman et al. (2003) state that the claim of non inferiority for the new treatment relative to the active control can be demonstrated if the upper limit of a two sided 100(1 − α)% confidence interval for ln(λt /λc ) is less than a pre specified fraction of the lower limit of a two sided 100γ% confidence interval for the active control effect established by the meta-analysis. This is known as the “two confidence intervals procedure”. Specifically in order to claim non inferiority in the current trial it is necessary to show that −1 −1 1 + γ \ \ \ ln (λ (1−α/2)se[ln (λ ( )se[ln (λ\ t /λc )+Φ t /λc )] < (1−f0 ){ln (λp /λc )−Φ p /λc )]} . 2 (49.3) We may re-write the non inferiority condition (49.3) in terms of a one-sided Wald test of the form \ ln (λ t /λc ) − δ0 < Φ−1 (1 − α/2) , (49.4) \ se[ln (λt /λc )] where −1 δ0 = (1 − f0 ){ln (λ\ ( p /λc ) − Φ 1+γ )se[ln (λ\ p /λc )]} 2 (49.5) is the non inferiority margin. The choice f0 = 1 implies that the entire active control effect must be retained in the new trial and amounts to running a superiority trial. At the other end of the spectrum, the choice f0 = 0 implies that none of the active control effect need be retained; i.e., the new treatment is only required to demonstrate effectiveness relative to placebo. The usual choice is f0 = 0.5, implying that the new treatment is required to retain at least 50% of the active control effect. The usual choice for α is α = 0.05. A conservative choice for the coefficient γ is γ = (1 − α) = 0.95. Rothman et al. (2003) refer to this method of establishing the non inferiority margin as the “two 95 percent 49.2 The Non Inferiority Margin 985 <<< Contents * Index >>> 49 Non Inferiority Trials Given Accrual Duration and Study Duration two sided confidence interval procedure” or the “95-95 rule”. In general this approach leads to rather tight margins unless the active control effect is substantial. Rothman et al. (2003) have also proposed more lenient margins that vary with the amount of power desired. Fleming (2007), however, argues for the stricter 95-95 rule on the grounds that it offers greater protection against an ineffective medical compound being approved in the event that the results of the previous trials used to establish the active control effect are of questionable relevance to the current setting. Accordingly we evaluate (49.5) \ with γ = 0.95, f0 = 0.5, ln (λ\ p /λc ) = 0.234 and se[ln (λp /λc )] = 0.075 thereby obtaining the non inferiority margin to be δ0 = 0.044 for the log hazard ratio and exp(0.044) = 1.045 for the hazard ratio. 49.3 Design of Metastatic Colorectal Cancer Trial In this section we will use East to design a single-look non inferiority trial comparing the test drug Xeloda (treament t) to the active control 5FU+LV (treatment c) for the treatment of metastatic colorectal cancer. On the basis of a meta analysis of ten previous studies of the active control versus placebo (Rothman et. al. 2003), a non inferiority margin of 1.045 for λt /λc has been established. Thus we are interested in testing the null hypothesis of inferiority H0 : λt /λc ≥ 1.045 versus the one-sided alternative hypothesis that λt /λc < 1.045. Suppose the trial is planned to enroll for 30 months and finish within 70 months of the last patient enrolled. 49.3.1 Single-Look Design We will use East to create an initial single-look design having 80% power to detect the alternative hypothesis H1 : λt /λc = 1 with a one sided level-0.025 non-inferiority test. To begin click Survival: Two Samples on the Design tab and then click Parallel Design: Logrank Test Given Accrual Duration and Study Duration as shown below. A new screen will appear. Enter the appropriate design parameters into the dialog box 986 49.3 Trial Design – 49.3.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as shown below. The box labeled Variance of Log Hazard Ratio specifies whether the calculation of the required number of events is to be based on the variance estimate of the log hazard ratio under the null hypothesis or the alternative hypothesis. The default choice in East is Null. Most textbooks recommend this choice as well (see, for example Collett, 1994, equation (2.21) specialized to no ties). It will usually not be necessary to change this default. For a technical discussion of this issue refer to Appendix B, Section B.5.3. Next click on the Accrual/Dropout tab. Here we will specify the accrual information and dropout rates. Set the accrual duration to 30 months and the study duration to 100 months in the Accrual box. Also, suppose that there are 5% drop-outs per year in each arm. Enter these values as shown below. Click on Compute to complete the design. The design is shown as a row in the Output Preview located in the lower pane of this window. You can select this design 49.3 Trial Design – 49.3.1 Single-Look Design 987 <<< Contents * Index >>> 49 Non Inferiority Trials Given Accrual Duration and Study Duration by clicking anywhere along the row in the Output Preview. With Des1 selected, click the icon to display the details of this design in the upper pane, which are shown below. You may also wish to save this design. Select Des1 in the Output Preview window and click the to save this design to Workbook1 in the Library. It is immediately evident that Des1 is untenable. It requires 16,205 events to be fully powered and 18,527 subjects to obtain those events within the course of the study. The problem lies with trying to power the trial to detect a hazard ratio of 1 under the alternative hypothesis. Suppose instead that the investigators actually believe that the treatment is slightly superior to the active control, but the difference is too small to be detected in a superiority trial. In that case a non-inferiority design powered at a hazard ratio less than 1 (0.95, say) would be a better option because such a trial would require fewer events. 988 49.3 Trial Design – 49.3.1 Single-Look Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To see this create a new design by selecting Des1 in the Library, and clicking the icon on the Library toolbar. Then edit this design by specifying a hazard ratio of 0.95 under the alternative hypothesis as shown below. Click the Compute button to generate output for Des2. With Des2 selected in the Output Preview, click the icon to save Des2 to the Library. In the Library, select the rows for Des1 and Des2, by holding the Ctrl key, and then click the 49.3 Trial Design – 49.3.1 Single-Look Design 989 <<< Contents * Index >>> 49 Non Inferiority Trials Given Accrual Duration and Study Duration icon. The upper pane will display the details of the two designs side-by-side: Des2 is clearly easier to implement than Des1. It requires only 3,457 events to be fully powered. This can be achieved with only 3,973 patients enrolled in the study. 49.3.2 Early Stopping for Futility Under the null hypothesis, Des2, with 3,457 events, has expected study duration of 93.2 months. This is a very long time commitment for a trial that is unlikely to be successful. Therefore it would be a good idea to introduce a futility boundary for possible early stopping. Since we wish to be fairly aggressive about early stopping for futility we will generate the futility boundary from the Gamma(−1) β spending function. On the other hand since there no interest in early stopping for efficacy we will not use an efficacy boundary. Create a new design by selecting Des2 in the Library, and clicking the the Library toolbar. Change the number of looks from 1 to 3. 990 49.3 Trial Design – 49.3.2 Early Stopping for Futility icon on <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Next, click on the Boundary tab. Enter the parameters as shown below. Be sure to select the Non Binding option. This choice gives us the flexibility to continue the trial even if a futility boundary has been crossed. Data monitoring committees usually want this flexibility; for example, to follow a secondary endpoint. Click the Compute button to generate output for Des3. With Des3 selected in the Output Preview, click the icon to save Des3 to the Library. In the Library, select the rows for Des1, Des2, and Des3 by holding the Ctrl key, and then click the icon. The upper pane will display the details of the three designs side-by-side: 49.3 Trial Design – 49.3.2 Early Stopping for Futility 991 <<< Contents * Index >>> 49 Non Inferiority Trials Given Accrual Duration and Study Duration Observe that while the sample size has been inflated to 4,344 subjects compared to Des2, the expected study duration under H0 has been cut down to 39.6 months and the expected sample size under H0 is 3,965. It would also be useful to simulate Des3 under a variety of scenarios for the hazard ratio. Select Des3 in the Library and click the icon. You will be taken to the following simulation worksheet. We wish to simulate this trial under the null hypothesis that the hazard ratio is exp(0.044) = 1.045. To do this go to the Response Generation tab in the simulation worksheet. In this tab change the hazard ratio to 1.045 as shown below. Next, click the Simulate button to simulate 10000 trials. A new row labeled Sim1 will appear in the Output Preview window. Select Sim1 in the Output Preview and click icon to save it to the Library. In the Library, double-click Sim1. A portion the of the output is displayed below. (The actual values may differ, depending on the 992 49.3 Trial Design – 49.3.2 Early Stopping for Futility <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 starting seed used). Note that 238 out of the 10000 simulations rejected the null hypothesis when it was true, thus confirming (up to Monte Carlo accuracy) that this design achieves a type-1 error of 2.5%. Also, observe that 50% of these trials have crossed the futility boundary at the very first interim look after only 24.7 months of study duration. 49.3 Trial Design 993 <<< Contents * Index >>> 50 A Note on Specifying Dropout parameters in Survival Studies This note gives details on specifying dropout parameters for survival studies in East. Dropout in a survival study is a competing risk. You may specify dropout rate as a hazard rate or as a probability of a subject dropping out within a specific period after entering the study. Very often, people, based on their past experience in a particular therapeutic area, are in a position to estimate likely dropout rates in a future study in the same therapeutic area. Their past experience may be that a specific percentage of subjects like 5% or 10% drop out of a study. We will explain with an example, how such estimates can be used in specifying input parameters for dropout rates in East. Example 1: Logrank Test Given Accrual Duration and Study Duration Suppose we are designing a survival study with the following parameters: Design Type: Superiority Number of Looks: 3 Test Type: 1-sided Type I Error: 0.025 Power: 0.9 Allocation Ratio: 1 Hazard Rate (Control): 0.03466 (default) Hazard Ratio:0.7 Hazard Rate (Treatment): 0.024 (this is computed by East given the above two inputs) Variance of Log Hazard Ratio: Null Boundary specification: Spending Function -Lan-DeMets (OF) Accrual Duration: 20 months Study Duration: 40 months Further, it is expected that about 10% of the subjects are likely to drop out by end of the study. Now the problem is how to translate this estimate to either a hazard rate or a probability of dropout in a specific period, in the light of the facts that subjects accrue over a time period and the risk set for dropouts will be diminishing due to subjects leaving the study because of events. One way to find the right specification for dropout rate is by trial and error method. We make an initial guess and compute the design. The detailed output for the design will show estimates for sample size and maximum dropouts. If the estimated dropouts is closer to 10% of the sample size, then we can stop there. Otherwise, we have to increase or decrease the input specification for dropout rate and try again till we are able to see the estimated maximum dropouts is about 10% of the estimated maximum sample size. We can try to create a design with the above input parameters, by entering the input values in the dialog box, in the usual way. For dropout specification, suppose, we 994 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 specify the probability of dropout as 0.1 by time 40, the study duration. This implies that the probability of a subject dropping out of the study within 40 months after entering the study is 0.1. The input dialog box for dropout information will be as shown below. The equivalent specification in terms of hazard rate can be seen by choosing the item ’Hazard Rates’ in the Input Method drop down box, which is shown below. Please apply Increase Decimal precision available at the top right corner of the Input dialog to get the exact results. Now Compute this design and save it to a library node. If you double-click on this 995 <<< Contents 50 * Index >>> A Note on Specifying Dropout parameters in Survival Studies node, part of the detailed output will appear as shown below. The above results show that the the maximum dropouts is only 5.1% (31/602) of maximum sample size and not the desired value of 10%. Since all the subjects are not accrued at the beginning of this 40 month duration study, specifying a subject’s probability of dropping out as 0.1 within 40 months may not be appropriate. As the accrual duration is 20 months and if we assume that the average accrual duration of the subjects is 10 months, a subject may be in the study on the average for a maximum of 30 months since the maximum study duration is 40 months. So let us specify the probability of dropout in 30 months period as 0.1 as shown below. 996 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For this design, the detailed output shows the following results. Now the maximum dropouts observed is 6.7% (41/609) of the maximum sample size. So we need to increase the dropout probability to a suitable value. Let us try out 0.15 as the probability of dropout by time 30. The design obtained with the above specification for dropout rate gives the following 997 <<< Contents 50 * Index >>> A Note on Specifying Dropout parameters in Survival Studies results. Now the percentage of maximum dropouts to maximum sample size is 10.1% and it satisfies our aim. Note: Some users may prefer to specify dropout rates upfront in terms of dropout hazard rates instead of probability of dropout. In either case, they may want to carry out the trial and error process, described above, in terms of dropout hazard rates instead of using probability of dropout. 998 <<< Contents * Index >>> 51 Multiple Comparison Procedures for Survival Data As with both continuous and discrete data, it is often desired to address multiple objectives during one single trial for a survival analysis. Here, the outcome of interest is typically the time from entry until a specific event is observed (i.e. death, recurrence, medical event). As with other data outcomes, formal statistical hypothesis tests are used to support or disprove clinical claims for survival data. When objectives are formulated into a family of hypotheses, as is the case with multiple comparison procedures, type I error is inflated. Failure to compensate for this can have adverse consequences. For example, a drug could be approved even when it is no better than placebo. Multiple comparison (MC) procedures guard against this inflation of type I error due to multiple testing. East supports the calculation of power from simulated survival data using multiple different MC procedures. The user can choose the most relevant MC procedure that provides maximum power while maintaining the FWER. East maintains strong control of FWER, which refers to the preservation of the probability of incorrectly claiming at least one null hypothesis. The difference between strong control and weak control of FWER is that weak control of FWER assumes that all hypotheses are true. The following MC procedures are available for survival endpoints in East. Category P-value Based Procedure Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback Reference Bonferroni CE (1935, 1936) Sidak Z (1967) Benjamini Y and Hochberg Y ( 1997) Holm S (1979) Hochberg Y (1988) Hommel G (1988) Westfall PH, Krishen A (2001) Wiens B, Dimitrienko A (2005) P-value based procedures strongly control the FWER regardless of the joint distribution of the raw p-values as long as the individual raw p-values are legitimate p-values. A thorough discussion on calculating the expected number of events d(l) in a time-to-event trial can be found in the Appendix D. 51.0.3 Single step MC procedures East provides p-value based single step MC procedures to compute power for a – 51.0.3 Single step MC procedures 999 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data survival data analysis. As with continuous outcomes, these include the Bonferroni procedure, the Sidak procedure, and the weighted Bonferroni procedure. Example: STAMPEDE study The STAMPEDE study is an ongoing, open-label, 5-stage, 6-arm randomized controlled trial using multi-arm, multi-stage (MAMS) methodology for men with prostate cancer. Started in 2005, it was the first trial of this design to use multiple arms and stages synchronously. The study population consists of men with high-risk localized or metastatic prostate cancer, who are being treated for the first time with long-term androgen deprivation therapy (ADT) or androgen suppression. The study started with 5 treatment groups: Standard of care (SOC) = ADT SOC + zoledronic acid (IV) SOC + docetaxel (IV) SOC + celecoxib, an orally administered cox-2 inhibitor SOC + zoledronic acid + docetaxel SOC + zoledronic acid + celecoxib MAMS trials allow for the simultaneous assessment of a number of research treatments against a single control arm. By assessing several treatments in one trial, information can be acquired more quickly and with smaller numbers of patients. By combining multiple stages, this adaptive design allows continuing investments to be focused on treatments that show promise. Any therapy with insufficient evidence of activities is discontinued. The Bonferroni and Sidak procedures in East are presented using relevant data from the STAMPEDE trial for a fixed-sample design. Under the Design tab in the Survival group, select Many Samples - Pairwise Comparisons to Control - Logrank Test. The following screen is displayed. 1000 – 51.0.3 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Change the default Number of Arms: to 6. Under the current tab, Test Parameters, keep the Rejection Region: assigned to Left-Tail, keep the default Type I Error (α) to 0.025, ensure that Fix: is set to the total number of events, and enter the value 1200. The type of Test Statistic used to calculate power can be identified as either the Logrank, Wilcoxon-Gehan, or Harrington-Fleming. Keep the default value of Logrank. Select both Bonferroni and Sidak for the choice of Multiple Comparisons Procedures. Select the Response Generation tab: This is where the user can specify the Response Distribution: to be either Exponential, Weibull, Lognormal, or R function. The Input Method can be set to either Median Survival Times, Cum. % Survival, or Hazard Rates. In addition the Time Unit can be selected to be either days, weeks, months or years. Keep the Response Distribution: as Exponential, set the Input Method to Hazard Rates, and the Time Unit to years. Enter the following information into the Survival – 51.0.3 Single step MC procedures 1001 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data Information table: In the next tab, Accrual/Dropouts, the user sees the following input dialog box which allows the specification of sample size, duration of follow-up, as well as Accrual info and Piecewise Dropout Information. Set the Sample Size to 3400 and ensure that the Subjects are followed: dropdown is selected to be “Until End of Study”. The Accrual Duration Time Unit: is “Years”, the number of Accrual Periods is 1, and the Input Method is “Accrual Rates”. The Accrual Rate per Year is 500, starting at time 0. There is no Piecewise Dropout Information therefore keep the Number of Pieces: set to 0. In the upper right hand of the Simulation Window, click the Include Options button, 1002 – 51.0.3 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and select “Randomization”. In the now new Randomization tab, the second column of the Table of Allocation table displays the allocation ratio of each treatment arm to that of control arm. The cell for control arm is always one and is not editable. Only those cells for treatment arms other than control need to be entered. The default value for each treatment arm is 1, which represents a balanced design. For the STAMPEDE, change the allocation ratio of the treatment arms to all be 0.5. The last tab is the Simulation Controls. For this example, all simulation defaults can be maintained. The Output Options box is where the user can choose to save summary statistics for each simulation run or to save subject level data for a specific number of runs. Click Simulate to start the simulations. Once completed, East will add an additional row to the Output Preview, labeled as Sim 1. MCP = Bonferonni MCP = Sidak – 51.0.3 Single step MC procedures 1003 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data Note that two new simulations are displayed in the Output Preview window. Select the corresponding rows and save to the Library. Again select the two simulations and 1004 – 51.0.3 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 click the Output Summary icon: Bonferroni and Sidak procedures have high disjunctive and global powers of about 97% and conjunctive power of about 3%. Weighted Bonferroni procedure The same example based on the STAMPEDE study will be used to illustrate the . In the Design weighted Bonferroni procedure. Select Sim 1 in Library and click Parameters tab, under the Multiple Comparison Procedures box, uncheck the – 51.0.3 Single step MC procedures 1005 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data Bonferroni box and check the Weighted Bonferroni box. An additional table Treatment Arms has been added which includes a column labeled Proportion of Alpha. This is where to specify the proportion of total alpha to be spent in each test. If necessary, East will normalize the column total to add up to 1, and the default is to distribute the total alpha equally among all tests. Here we have 5 tests in total, therefore each of the tests have proportion of alpha as 1/5 or 0.2. Other proportions can be specified as well. For this example, keep the equal proportion of alpha for each test. All other values can remain the same as in the previous example. Click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview. 1006 – 51.0.3 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The weighted Bonferroni MC procedure has global and disjunctive power of 96.9% and conjunctive power of 29.8%. Note that, the powers in the weighted Bonferroni procedure is quite close to the Bonferroni procedure. This is because the weighted Bonferroni procedure with equal proportion is equivalent to the simple Bonferroni procedure. The exact result of the simulations may differ slightly, depending on the seed. Select the simulation in the Output Preview and click the simulation to the workbook in the Library. 51.0.4 icon. This will save Data-driven step-down MC procedure In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures the decision of one hypothesis test can influence the decisions on the other tests. There are two types of stepwise procedures. The first proceeds in data-driven order. The other type follows a pre-defined fixed order. Stepwise tests that are in data-driven order can proceed in either a step-down or step-up manner. East supports the Holm step-down MC procedure, which starts with the most significant comparison and continues until the test for a certain hypothesis fails. The testing procedure stops at the first non-significant comparison, and all remaining hypotheses are retained. Holm’s step-down The STAMPEDE example will be used to illustrate Holm’s step-down procedure. Select Sim 1 in Library and click . In the Design Parameters tab under the Multiple Comparison Procedures box, uncheck the Weighted Bonferonni box and check the Holm’s Step-down box. All other previously inputs can stay the same. To calculate the power, click Simulate. – 51.0.4 Data-driven step-down MC procedure 1007 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data Once completed, East will add an additional row to the Output Preview. Holm’s step-down procedure has global and disjunctive power of 97.1% and conjunctive power of 58.6%. The exact result of the simulations may differ slightly, depending on the seed. Now select the current simulation Output Preview and click 1008 – 51.0.4 Data-driven step-down MC procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon to save it to the workbook in the Library. 51.0.5 Data-driven step-up MC procedures Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures for time to event data, the Hochberg step-up and the Hommel step-up. In the Hochberg step-up procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1 α for j = 1, · · · , i. Fixed i sequence test and fallback test are the types of tests which proceed in a predetermined – 51.0.5 Data-driven step-up MC procedures 1009 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data order. Hochberg’s and Hommel’s step-up procedures Hochberg’s and Hommel’s step-up procedures are described below using the STAMPEDE example from the previous sections. All other design specification remains same except that we are using Hocheberg and Hommel step-up procedures in place of Holm’s Step Down. Select Sim 1 in Library and click . In the Design Parameters tab, under the Multiple Comparison Procedures box, uncheck the Holm’s Step Down box and check the Hochberg’s step-up and Hommel’s step-up boxes. Now click Simulate to obtain power. Once the simulation run has completed, East will add two additional rows to the Output Preview window. Hocheberg and Hommel procedures both have disjunctive and global powers of about 75% and conjunctive power about 6%. The exact result of the simulations may differ slightly, depending on the seed. Select these simulations in the Output Preview using Ctrl key and click 1010 icon. This will save them to the corresponding workbook in the – 51.0.5 Data-driven step-up MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library. 51.0.6 Fixed-sequence stepwise MC procedures In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both of these procedures for survival, or time to event data. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is pre-specified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence test is that once a hypothesis is not rejected no further testing is permitted. This will – 51.0.6 Fixed-sequence stepwise MC procedures 1011 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data lead to lower power to reject hypotheses tested later in the sequence. Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Fixed sequence testing procedure The STAMPEDE example is used to illustrate fixed sequence testing procedure. Select Sim 1 in Library and click . Under the Design Parameters tab in the Multiple Comparison Procedures box, uncheck the Bonferonni box and check the Fixed Sequence box. Notice that in the Test Parameters window a table called Treatment Arms has been added, which includes a column labeled Test Sequence. This is where the order of hypothesis tests are determined. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. For optimal power in fixed sequence procedure, the early tests in the sequence should have larger treatment effects. For now we will keep the default which means that H1 will be tested first followed by H2 and finally H3 will be 1012 – 51.0.6 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tested. Now click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the Output Preview. The fixed sequence procedure with the specified sequence has global and disjunctive power of 87.8% and conjunctive power of 60.1%. Select the simulation in the Output – 51.0.6 Fixed-sequence stepwise MC procedures 1013 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data Preview and click icon. It is worthwhile to note that the fixed sequence procedure is powerful provided the hypotheses are tested in a sequence of descending treatment effects. Fixed sequence procedure controls the FWER because for each hypothesis, testing is conditional upon rejecting all hypotheses earlier in sequence. As usual, the exact result of the simulations may differ slightly, depending on the seed. Fallback procedure The STAMPEDE example is used to illustrate the fallback procedure. Select Sim 1 in 1014 – 51.0.6 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library and click . Under the Design Parameters tab in the Multiple Comparison Procedures box, uncheck the Bonferonni box and select the Fallback box. Notice that in the Test Parameters window a table called Treatment Arms has been added, which includes a columns labeled Test Sequence and Proportion of Alpha. In the column Test Sequence, the user specifies the order in which the hypotheses will be tested. Specify 1 for the test that will be tested first, 2 for the test that will be tested next and so on. By default East specifies 1 to the first test, 2 to the second test and so on. Keep the default, which means that H1 will be tested first followed by H2 and so on until H5 is tested. In the column Proportions of Alpha, the user specifies the proportion of total alpha to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize to add to 1. By default East, distributes the total alpha equally among the all tests. There are 5 tests in total, therefore each of the tests have proportion of alpha as 1/5 or 0.2. Other proportions can be specified, however for this example, keep the equal proportion of alpha for each test. Click Simulate to obtain power. Once the simulation run has completed, East will add an additional row to the – 51.0.6 Fixed-sequence stepwise MC procedures 1015 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data Output Preview. The fallback procedure with the specified sequence has global and disjunctive power of 97.1% and conjunctive power of 45.5%. Select the simulation in the Output Preview 1016 – 51.0.6 Fixed-sequence stepwise MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and click icon to save to the workbook in the Library. It is worthy to note that the fallback test is more robust to the misspecification of the test sequence while fixed sequence test is very sensitive to the test sequence. If the test order is incorrectly specified, fixed sequence test has very poor performance. 51.1 Comparison of MC procedures East allows the capability of running all simulations at once in order to choose the most appropriate MC procedure. For the STAMPEDE example, Select Sim 1 in Library and click . Under the Design Parameters tab in the Multiple 51.1 Comparison of MC procedures 1017 <<< Contents 51 * Index >>> Multiple Comparison Procedures for Survival Data Comparison Procedures box, check the all boxes. Select Simulate and choose Continue as each simulation completes. Following output displays the powers under different MC procedures. Here we have used equal proportions for weighted Bonferroni and Fallback procedures. For the two fixed sequence testing procedures (fixed sequence and fallback), just one sequence has been used: the default (H1 , H2 , H3 ). The fixed sequence procedure results in the lowest power at 88.2%. Therefore, the fixed sequence procedure easily may not be considered as most appropriate. For this example, most all procedures result in approximately 97% global and disjunctive powers. The step-up and fixed sequence procedures produce the highest conjunctive power at approximately 62% each. 1018 51.1 Comparison of MC procedures <<< Contents * Index >>> Volume 7 Adaptive Designs 52 Introduction To Adaptive Features 1020 53 The Motivation for Adaptive Sample Size Changes 54 The Cui, Hung and Wang Method 55 The Chen, DeMets and Lan Method 56 Muller and Schafer Method 1055 1160 1221 57 Conditional Power for Decision Making 1350 1027 <<< Contents * Index >>> 52 Introduction To Adaptive Features This volume describes the adaptive features that can be used in the design of late stage adaptive clinical trials. The adaptive features are fully integrated into East and are invoked through simulation and calculation tools that will be described in the chapters of this volume. The PhRMA Adaptive Design Working Group defines an adaptive trial as any clinical trial which uses accumulating data, possibly combined with external information, to modify aspects of the design without undermining the validity and integrity of the trial (see Gallo et. al.,2006). This definition is too broad for our purposes. It covers a very wide range of adaptations including dose response strategies in phase I trials, randomized play the winner rules for dose selection in early phase II trials, combination phase II/III designs, and mid-course data-dependent alterations to the later stage phase II and phase III designs. Adaptive features in East deal mainly with the last case. They extend the group sequential methodology of East in a natural way toward data-dependent changes in sample size, number of events (for event-driven trials). These adaptive extensions of group sequential designs are included in the list of adaptive methods discussed in the newly released FDA Guidance For Industry on Adaptive Design Clinical Trials for Drugs and Biologics (2010). This volume contains Chapters 52 through 57. Chapter 52, the current chapter, describes the availability of adaptive features in East and contents of the remaining chapters in this volume. Chapter 53 provides the motivation for making adaptive changes to a late phase II or phase III trial. Three examples of actual case studies are included in this chapter; for continuous, discrete and survival endpoints, respectively. East provides two different methods for controlling type-1 error after an adaptive design change. These methods are described in the Chapter 54 and Chapter 55 respectively. They may be used to make sample size modifications for trials with normal or binomial endpoints and to make sample size and event modifications for trials with time-to-event or survival endpoints. The third method, described in Chapter 56, offers considerable additional flexibility. All three methods are able to preserve the type-1 error in the face of data dependent changes to the study design. Each of these chapters is self-contained with a discussion of the statistical methodology followed by one or more worked examples. A common feature in all these adaptive methods is their reliance on conditional power for making the adaptive modifications. We have developed special conditional power calculators for this purpose. The worked examples within each chapter illustrate the use of these calculators. Additionally, Chapter 57 is devoted entirely to describing how 1020 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 to invoke and use the conditional power calculators. The first adaptive method in East is the ”weighted combinations” method due to Cui, Hung and Wang (1999), and Lehmacher and Wassmer (1999). In East this is referred to as the CHW method. In this method, the test statistic used to determine statistical significance at each interim look is a weighted combination of independent Wald statistics with pre-specified weights. This method is available for designs under Continuous, Discrete and Survival endpoints. The CHW method can be implemented at any interim look in a group sequential trial and can also be implemented multiple times. We provide simulation tools for evaluating the operating characteristics of the CHW design. This tool for the CHW method only permit sample adaptive size increases, not decreases. A special CHW Interim Monitoring Worksheet is provided to facilitate the interim monitoring and final analysis of such a trial. The second adaptive method was proposed initially by Chen, DeMets and Lan (2004) and has now been extended by Gao, Ware and Mehta (2008) and Mehta and Pocock (2010). It is referred to as the CDL method. this method can be used to make sample size modifications for trials with normal or binomial endpoints and to make sample size and event modifications for trials with time-to-event or survival endpoints. The main advantage of the CDL method over the CHW method is that it permits data dependent sample size changes and event changes without the need to adjust the final test statistic with pre-specified weights. This is an attractive feature because the trial results can be presented in a conventional manner without artificially weighting the data from the two stages in ways that are difficult to explain to investigators who might be unfamiliar with the technical details of adaptive methodology. The method is, however, only applicable to two-stage adaptive designs or to multi-stage adaptive designs in which the sample size or number of events is changed at the penultimate stage. Furthermore the simulation tools for the CDL method only permit sample adaptive size increases, not decreases. This is in keeping with the recommendation of the FDA Guidance Document on Adaptive Design (2010). The third adaptive method is referred to as the Müller and Schäfer method. It is based on preserving the conditional type-1 error computed at the time of the adaptation. Many authors have arrived independently at this key idea for making adaptive changes to a clinical trial. For example, it is central to the two-stage designs of Proschan and Hunsberger (1995), and the recursive combination tests of Brannath, Posch and Bauer (2002). Jennison and Turnbull (2003) claim that any fully flexible adaptive approach must respect this principle. The most general application of this principle is due to Müller and Schäfer (2001). These authors have shown explicitly that it is permissible to make any desired data dependent change to an ongoing group sequential clinical trial, possibly more than once, by the simple process of preserving the conditional type-1 error of the remainder of the trial after each change. When the 1021 <<< Contents 52 * Index >>> Introduction To Adaptive Features only adaptive change is a change in sample size, the Müller and Schäfer method can be shown to be equivalent to the CHW method. However, the Müller and Schäfer method is not restricted to sample size changes exclusively. The following table displays the types of designs for which adaptive methods are available in East with indications of their limitations. Table 52.1: Adaptive Methods - Designs Applicable AdaptiveM ethoda Multi-look Design CHW CDL MS Continuous-Two Samples-Difference of Means-Superiority yes yes yes Discrete-Two Samples-Difference of Proportions-Superiority yes yes yes Discrete-Two Samples-Ratio of Proportions-Superiority yes yes yes Survival-Two Samples-Both Designs-Superiority yes yes yes (a) H0 only (1-sided); H0 or H1 (1-sided, binding/non-binding) 1022 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 52.1 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 52.1 Settings 1023 <<< Contents 52 * Index >>> Introduction To Adaptive Features The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 1024 52.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 52.1 Settings 1025 <<< Contents 52 * Index >>> Introduction To Adaptive Features simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 1026 52.1 Settings <<< Contents * Index >>> 53 The Motivation for Adaptive Sample Size Changes In this chapter, we will highlight, through some prototypical examples, the motivation for making adaptive changes to the sample size in an on-going clinical trial. Sample size is a key design input for any randomized clinical trial. Unfortunately, it is often computed in the face of inadequate knowledge about σ 2 the inter-subject variance, and δ the effect size. Economic pressures, possibly combined with competition for patients, then encourage trial investigators to make optimistic decisions about these two design parameters, a tendency that frequently results in underpowered studies. An underpowered trial is extremely undesirable, for it places human subjects at risk with a low probability of reaching a positive scientific conclusion and diverts resources that could be better utilized elsewhere. Therefore, in recent years there has been a considerable amount of research on more flexible clinical trials where the sample size is re-estimated after the clinical trial is underway, on the basis of updated information about σ 2 and δ. The updated information may arise either from external sources, from interim results of the on-going trial, or from a combination of the two. Sample size re-estimation based exclusively on updated information about σ 2 is covered in Chapter 59 in Special Topics volume of the East Manual, dealing with information based design. Here we are concerned primarily with sample size re-estimation due to updated information about δ after the study is activated. Although statistical methods are available to make data dependent mid-course changes to sample size, the appropriateness of such sample size re-estimation has generated some debate. Critics of this type of design revision argue that the same end – ensuring adequate power at the appropriate value of δ – can be achieved more efficiently through a group sequential design (Tsiatis and Mehta, 2003; Jennison and Turnbull, 2003). This is a valid argument in settings where one is prepared to pre-specify a minimum clinically meaningful value of δ, commit a large maximum sample size to the trial up-front, and forgo the option to make data driven design changes as the trial progresses. There may be situations, however, where the flexibility to learn from the interim data and adapt the future course of the trial offsets the improved efficiency of the group sequential approach. Furthermore if the primary endpoint of the study is only measured after a lengthy follow-up, the sample size saving available through a group sequential design might be rather small. Finally, for two-stage design in which the sample size is only increased if the interim results fall in a promising zone, there may be no loss of efficiency whatsoever. We provide an example of this type at the end of Chapter 55. 1027 <<< Contents 53 53.1 * Index >>> The Motivation for Adaptive Sample Size Changes The Benefits of Adaptive Designs 53.1.1 Rescuing an Underpowered On-Going Study 53.1.2 Designing a Study 53.1.3 Availing of Data from External Sources After the Study is Activated 53.1.4 Reducing the Sponsor’s Risk There are several reasons why it might be beneficial to allow for the possibility of a sample size increase in the middle of a group sequential clinical trial. Below we present a few real examples that we have encountered either in publications or in our consulting practice. 53.1.1 Rescuing an Underpowered On-Going Study Cui, Hung and Wang (1999) discuss a phase III group sequential clinical trial for evaluating the effect of a new drug for prevention of myocardial infarction in patients undergoing coronary artery bypass graft surgery. The study was planned to detect a reduction in the incidence rate from 22% for placebo to 11% for the new drug with 95% power on a 1-sided level 0.025 test. The study was planned for one interim and one final look. On this basis the maximum sample size was computed to be 591 patients. There was, considerable uncertainty about the incidence rates at which the study was powered because, at that time, very little data were available on the new drug. The interim analysis results were less optimistic than was hoped at the design stage. The incidence rate in the placebo group was close to the rate specified at design, but the incidence rate in the treatment group was only 16.5%. The drop in the incidence of myocardial infarction due to the treatment was only half of what was expected. At that time, there was no valid method in the literature for increasing the sample size in mid-stream based on the observed efficacy outcome at the interim analysis. Thus the sample size was not increased and the trial eventually failed. 53.1.2 Designing a Study Given Limited Data About the Efficacy Endpoint Consider the design of a two-arm schizophrenia trial for subjects with negative symptoms. For reasons of confidentiality, we will not reveal the names of the two drugs being tested, but will simply refer to them as the control and treatment arms, respectively. The primary clinical endpoint is the change from baseline in the Negative Symptom Assessment (NSA) at six months. There is, however, a second regulatory requirement that the new treatment must also show benefit in functional outcome as measured by a 21-item clinician rated Quality of Life Scale (QLS) measuring psychosocial functioning. The design of this trial poses inherent difficulties for the sponsor because there is very limited data from previous trials on NSA, and no data whatsoever on QLS, for patients with negative symptoms. Therefore it is not clear what values of δ one should use for calculating sample size. A pure group sequential strategy would be powered at the smallest clinically meaningful value of δ. This option is impractical in the current situation because there exists no previous experience with QLS in negative symptoms patients, and hence no notion of what constitutes a clinically meaningful effect. A second option is to run a preliminary phase II study and then follow up with a separate phase III study using the results of the earlier study, possibly combined with newly available external information, as inputs for specifying 1028 53.1 The Benefits of Adaptive Designs – 53.1.2 Designing a Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 δ and σ. This is a safe conservative choice but it does delay the time taken to reach a final conclusion about the new product. Also, with this option, the data from the phase II study cannot be combined with the data from the phase III study. A third option is to combine the phase II and phase III designs into a single integrated trial using one of the three adaptive methods provided in EastAdapt. In this option, one would start out with an initial group sequential design, powered using a sample size that reflects a compromise between the scientific goal of detecting the smallest clinically meaningful value of δ and the pragmatic goal of staying within budgetary constraints. This compromise is justified because there is still considerable uncertainty about the precise value of δ that should be used to perform the sample size calculation. Therefore the study is activated with the understanding that the current sample size assessment is preliminary and will be re-visited at a future interim analysis time point, when reliable data on the NSA and QLS endpoints become available. 53.1.3 Availing of Data from External Sources After the Study is Activated A long-term clinical trial was activated comparing adjuvant chemotherapy to placebo in an oncology trial where the primary endpoint was survival. A retrospective analysis of historical data conducted at the design stage suggested that the study should be powered to detect a hazard ratio of 0.7. However, two years into the trial, a publication in a peer reviewed medical journal suggested that the quality of care in this disease had greatly improved, suggesting a decline in the hazard rate for the placebo arm. The investigators were very concerned by this report because it suggested that their study might now be underpowered. Although enrollment had been completed, there remained the option to adaptively extend the study duration, to see a larger number of events than had been planned at the design stage. 53.1.4 Reducing the Sponsor’s Risk From the sponsor’s perspective a very attractive feature of an adaptive design is the opportunity it gives to invest in the trial in stages and thereby reduce risk. Under this scenario, the initial (first stage) investment of sample size resources might be small. The second stage investment would then be contingent on seeing promising results from the first stage. The sponsor risk is thereby reduced since the request for additional sample size resources, if made, would imply that the trial has a good chance of success. Many small biotechnology companies rely on outside investors to finance their trials. Creative adaptive designs of this type might make the financing easier. In the remainder of the Chapter we illustrate all the above points through three case studies of actual phase 3 adaptive trials. In Section 53.2 we discuss a normal endpoint clinical trial of schizophrenia. In Section 53.3 we discuss a binomial endpoint clinical trial of acute coronary syndromes. In Section 53.4 we discuss a time-to-event 53.1 The Benefits of Adaptive Designs – 53.1.4 Reducing the Sponsor’s Risk 1029 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes (survival) endpoint trial of lung cancer. These three examples will be carried forward to Chapters 54 and 55 where they will be used to demonstrate trial design and interim monitoring in East. 53.2 Normal Endpoint: Schizophrenia Trial 53.2.1 Fixed Sample Design 53.2.2 Group Sequential Design 53.2.3 The Problem of Overruns 53.2.4 Adaptive Design 53.2.4 Adaptive Sample Size Increase 53.2.5 Adaptive Design Consider a two-arm trial to determine if there is an efficacy gain for an experimental drug relative to the industry standard treatment for negative symptoms schizophrenia. The primary endpoint is the improvement from baseline to week 26 in the Negative Symptoms Assessment (NSA), a 16-item clinician-rated instrument for measuring the negative symptomatology of schizophrenia. Let µt denote the difference between the mean NSA at baseline and the mean NSA at week 26 for the treatment arm and let µc denote the corresponding difference of means for the control arm. Denote the efficacy gain by δ = µt − µc . The trial will be designed to test the null hypothesis H0 :δ = 0 versus the one-sided alternative hypothesis that δ > 0. It is expected from limited data on related studies that δ ≥ 2 and σ, the between-subject standard deviation, is believed to be about 7.5. In the discussion that follows we shall focus our attention on adaptive sample size adjustments due to uncertainty surrounding the true value of δ. Even though the statistical methods discussed here are applicable when there is uncertainty about either δ or σ, the adaptive approach requires careful justification primarily when δ is involved. Adaptive sample size adjustments relating to uncertainty about σ are fairly routine and non-controversial. We shall consider fixed-sample, group sequential and adaptive design options for this study. There are advantages and disadvantages to each option with no single approach dominating over the others. We are interested, however, in exploring whether the adaptive methodology can add value to the better established fixed sample and group sequential approaches to trial design. We will see that an adaptive design alleviates to some extent the problem of “overruns” encountered by group sequential designs when the primary endpoint is observed after a lengthy follow-up period as is the case here. Additionally, we will see that an adaptive design may, in certain settings, have a more favorable risk versus benefit trade-off. 53.2.1 Fixed Sample Design Since it is believed a priori that δ ≥ 2, we first create Des 1, a single-look design with 80% power to detect δ = 2 using a one-sided level 0.025 test, given σ = 7.5. With these design parameters we can show that Des 1 will be fully powered if a total of 442 subjects are enrolled (221/arm). There is, however, considerable uncertainty about the true value of δ, and to a lesser extent about σ. Nevertheless it is believed that even if the true value of δ were as low as 1.6 on the NSA scale, that would constitute a clinically meaningful effect. We therefore also create Des 2, having 80% power to detect δ = 1.6 using a one-sided level-0.025 test, given σ = 7.5. Des 2 requires a total 1030 53.2 Normal Endpoint – 53.2.1 Fixed Sample Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 sample size of 690 subjects. We have now proposed two design options. Under Des 1 we would enroll 442 subjects and hope that the study is adequately powered, which it will be if δ = 2 and σ = 7.5. If, however δ = 1.6 the power drops from 80% to 61%. There is thus a risk of launching an underpowered study for an effective drug under Des 1. Under Des 2 we will enroll 690 subjects, thereby ensuring 80% power at the smallest clinically meaningful value, δ = 1.6, and rising to 94% power at δ = 2. The operating characteristics of Des 1 and Des 2 are displayed side by side in Table 53.1 for values of δ between 1.6 and 2.0. Table 53.1: Operating Characteristics of Des 1 and Des 2 δ 1.6 1.7 1.8 1.9 2.0 Des 1 Sample Size Power 442 442 442 442 442 61% 66% 71% 76% 80% Des 2 Sample Size Power 690 690 690 690 690 80% 84% 88% 91% 94% If resources were plentiful, Des 2 would clearly be the preferred option. The sponsor must, however, allocate scarce resources over a number of studies and in any case is not in favor of designing an overpowered trial. This leads naturally to considering a design that might be more flexible with respect to sample size than either of the above two single-look fixed sample designs. We will consider two types of flexible designs; group sequential and adaptive. 53.2.2 Group Sequential Design When sample size flexibility is desired for late-stage trials, it is often appropriate to first explore the group sequential option. Let us then construct a group sequential design with one interim look and 80% power to detect δ = 1.6 such that if in fact δ = 2, the trial will stop early. While this would appear to be an attractive option, it is important to consider not just the saving in study duration but also the saving in the actual number of subjects randomized to the study. Since the efficacy endpoint for this trial will only be observed at week 26, the actual saving in sample size will be affected by the enrollment rate. In the current study it is anticipated that subjects will enroll at an average rate of 8 per week. The number of subjects enrolled and the number of completers over time are displayed graphically in Figure 53.1 53.2 Normal Endpoint – 53.2.2 Group Sequential Design 1031 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes Figure 53.1: Impact of Enrollment Rate and Length of Follow-Up on Trial Completion Observe that there is a 26-week horizontal separation between the two parallel lines depicting, respectively, the graph for enrollment and the graph for study completion. This 26-week gap must be taken into consideration when evaluating the savings achieved by utilizing a group sequential design. The two major design parameters to be specified for a two-look group sequential design are the timing of the interim analysis and the amount of type-1 error to be spent. We will assume that data must be available for at least 200 completers before the trial can be terminated for efficacy so that an adequate safety profile may be developed for the study drugs. Therefore a suitable time point for the interim analysis is week 52, when we will have enrolled 416 subjects with data on 208 completers. Next we must decide on the amount of type-1 error to spend (see Lan and DeMets, 1983) for the early stopping boundary. It is generally held that the type-1 error should be spent conservatively in the early stages of a trial so as to ensure that results based on premature termination will be compelling and have the capacity to change medical practice (see Pocock, 2005). Suppose then that we use the γ(−4) error spending function proposed by Hwang, Shih and DeCani (1990) to obtain the early stopping 1032 53.2 Normal Endpoint – 53.2.2 Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 boundary. The boundary thus produced resembles the conservative O’Brien-Fleming (1979) boundary. The corresponding group sequential design, having a sample size of 694, is displayed in Figure 53.2 as Des 3. Figure 53.2: Group Sequential Design Denoted as Des 3 In Des 3 the nominal critical point for early stopping is 3.067 standard deviations. The one sided p-value corresponding to this early stopping boundary is 1 − Φ(3.067) = 0.0011 which, if met, would indeed be compelling enough to justify premature termination. Both Des 2 and Des 3 have 80% power to detect δ = 1.6 with a one-sided level-0.025 test. Their sample size commitments too are almost the same. However, under Des 2 there is no possibility of early stopping whereas under Des 3, it is possible to stop early and thereby save on sample size. Figure 53.2 shows that the expected number of completers if in truth δ = 1.6, is 663 subjects, a saving of 61 subjects compared to the maximum sample size of 694. The saving will be even more if the true value of δ is greater than 1.6. These expected savings in sample size are discussed next along with the problem of ”overruns”. 53.2.3 The Problem of Overruns Care must be taken when estimating the actual sample size savings of a group sequential design. Even if the early stopping boundary is crossed at week 52 on the basis of the data from the 208 completers, we must still take into account the additional 208 randomized subjects who enrolled between week 26 and week 52 for whom the week 26 endpoint will not yet have been attained. These additional 208 subjects are 53.2 Normal Endpoint – 53.2.3 The Problem of Overruns 1033 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes referred to as the ”overruns”. When the overruns are accounted for, the saving in sample size due to early stopping is only 694 − 416 = 278 subjects, rather than 694 − 208 = 486 subjects. The power and expected sample size values of the group sequential Des 3 for different choices of δ are displayed in Table 53.2. The table shows the impact of overruns on the expected sample size. For comparison we have also included corresponding power and sample size values for the fixed sample Des 2 in Table 53.2. Table 53.2: Operating Characteristics of Plan3 (Group Sequential) and Plan2 (Fixed Sample) δ Probability of Early Stopping 1.6 1.7 1.8 1.9 2.0 6.6% 7.9% 9.3% 11.0% 13.0% Plan3 (Group Sequential) Expected Sample Size No Overruns With Overruns 662 656 649 640 631 Power 676 672 668 663 658 80% 84% 88% 91% 94% Plan2 (Fixed Sample) SampSiz Power 690 690 690 690 690 80% 85% 88% 91% 94% It is seen from Table 53.2 that Des 3 offers a modest benefit relative to Des 2. After accounting for the overruns, the expected sample sizes under Des 3 range between 658 and 676 for corresponding values of δ between 2 and 1.6, as compared to a fixed sample size of 690 under Plan2. In terms of power, Des 2 and Des 3 are practically identical. For the current trial a group sequential design with conservative error spending offers no substantial advantage over a conventional single look design with a fixed sample size. One is still faced with the dilemma of committing excessive sample size resources up front in order to ensure adequate power at δ = 1.6, with limited prospects of saving on sample size in the event that δ = 2. Although in general group sequential designs do offer savings in expected sample size, their actual benefit may be diminished if a study enrolls subjects very rapidly but the primary endpoint can only be observed after a lengthy follow-up. In the current example we assumed that subjects are enrolled at the rate of 8 per week and the endpoint is observed after 26 weeks of follow-up for each subject. This resulted in 208 additional subjects being on-study who were not yet followed for 26 weeks at the time of the interim analysis. The efficiency loss due to an overrun of this magnitude was difficult to overcome. If instead the enrollment rate were to be halved to 4 subjects per week, and the endpoint were to be observed after only 12 weeks instead of 26 weeks, 1034 53.2 Normal Endpoint – 53.2.3 The Problem of Overruns <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 there would only be an overrun of 48 subjects, and the resulting operating characteristics of the two group sequential designs would be more favorable relative to the corresponding fixed sample design. The accrual rate and the duration of follow-up are thus two extremely important design parameters for a group sequential trial. We next consider adopting an adaptive design for this study. This is a radically different approach to trial design in which the difficulties encountered by group sequential designs – rapid accrual, delayed endpoint, and large up-front commitment of patient resources – can to some extent be mitigated. 53.2.4 Adaptive Design To motivate the adaptive design let us recall that although the actual value of δ is unknown, the investigators believe that δ ≥ 2. For this reason Des 1 was constructed to have 80% power to detect δ = 2. Des 2 on the other hand was constructed to have 80% power to detect δ = 1.6, the smallest clinically meaningful treatment effect. If there were no resource constraints one would of course prefer to design the study for 80% power at δ = 1.6 since that would imply even more power at δ = 2. However, as we saw in Table 53.1, this conservative strategy carries as its price a substantially larger up-front sample size commitment which is, moreover, unnecessary if in truth δ = 2. Des 3 was therefore constructed as a group sequential alternative to Des 2. Des 3 also has 80% power to detect δ = 1.6 but there is a possibility of early stopping. We have seen, however, that due to the overruns problem, the expected sample size savings realized by Des 3 is small while the up-front sample size commitment is large. The above difficulties lead us to consider whether Des 1, which was intended to detect δ = 2 with 80% power and hence does not have such a large up-front sample size commitment, might be improved so as to provide some insurance against substantial power loss in the event that δ = 1.6. The adaptive approach is suited to this purpose. In this approach we start out with a sample size of 442 subjects as in Des 1, but take an interim look after data are available on 208 completers. The purpose of the interim look is not to stop the trial early but rather to examine the interim data and continue enrolling past the planned 442 subjects if the interim results are promising enough to warrant the additional investment of sample size. This strategy has the advantage that the sample size is finalized only after a thorough examination of data from the actual study rather than through making a large up-front sample size commitment before any data are available. Furthermore if the sample size may only be increased but never decreased from the originally planned 442 subjects, there is no loss of efficiency due to overruns. The technical problem of avoiding inflating the type-1 error despite increasing the sample size in a data dependent manner has been solved by, among others, Cui, Hung and Wang (1999). 53.2 Normal Endpoint – 53.2.4 Adaptive Design 1035 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes Selecting the Criteria for an Adaptive Sample Size Increase The operating characteristics of an adaptive design depend in a complicated way on the criteria for increasing the sample size after observing the interim data. These criteria may combine objective information such as the current estimate of δ or the current conditional power with assessments of safety and with information available from other clinical trials that was not available at the start of the study. The adaptive approach provides complete flexibility to modify the sample size without having to pre-specify a precise mathematical formula for computing the new sample size based on the interim data. Therefore the full benefit of the flexibility offered by an adaptive design cannot be quantified ahead of time. Nevertheless it is instructive to investigate power and expected sample size by simulating the trial under different values of δ and applying precise pre-specified rules for increasing the sample size on the basis of the observed interim results. This will provide at least some idea, at the design stage, of the trade-off between the fixed sample or group sequential approaches and the adaptive approach. To this end we create Des 4, a design with 80% power to detect δ = 2 with a one-sided level-0.025 test, based on a planned enrollment of 442 subjects. Des 4 specifies, in addition, that there will be one interim analysis after 26 weeks of follow-up data are available on the first 208 subjects enrolled. The purpose of the interim analysis is not to stop the trial early but rather to examine the interim data and decide whether a sample size increase is warranted. If no action were taken at the interim look, Des 4 would be identical to Des 1. The timing of the interim look reflects a preference for performing the interim analysis as late as possible but nevertheless while the trial is still enrolling subjects since, once the enrollment sites have closed down, it will be difficult to start them up again. Under the assumption that subjects enroll at the rate of 8 per week we will have enrolled 416 subjects by week 52; 208 of them will have completed the required 26 weeks of follow-up for the primary endpoint, and an additional 208 subjects will comprise the overruns. Only the data from the 208 completers will be used in making the decision to increase the sample size. After this decision is taken, enrollment will continue until the desired sample size is attained. The primary efficacy analysis will be based on the full 26 weeks of follow-up data from all enrolled subjects. It should be noted that unlike the group sequential setting, where the 208 overruns played no role in the early stopping decision but were still added to the final sample size, here the data from the 208 overruns will be fully utilized in the primary efficacy analysis which will only occur when all enrolled subjects have completed 26 weeks of follow-up. This is one of the advantages of the adaptive approach relative to the group sequential approach for trials with lengthy follow-up. It remains to specify the criteria for increasing the sample size at the interim look. A well planned trial should pre-specify as far as possible the decision rules to be adopted for increasing the sample size once the interim data are available. Thereby the operating characteristics of the trial can be studied through simulation and if they are 1036 53.2 Normal Endpoint – 53.2.4 Adaptive Sample Size Increase <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 unsatisfactory, the rules for sample size adaptation can be modified. It should be stressed, however, that in practice there is flexibility to overrule these pre-specified rules should unexpected results, either internal or external to the trial, be encountered at the time of the interim analysis. Nevertheless a precise formula for increasing the sample size must be pre-specified for purposes of simulation. While there are an infinite number of ways to construct such a formula it must address the following three questions: For what range of interim outcomes should a sample size increase be contemplated? How should the magnitude of the new sample size be calculated? What should be the upper limit to the sample size increase? The answers to these questions might be driven by both clinical and business concerns, and will depend on the importance the investigators place on avoiding a false negative outcome for the current trial. Range of Interim Outcomes for a Sample Size Increase It is convenient to partition the sample space of possible interim outcomes into three zones; unfavorable, promising and favorable. An adaptive strategy is built on the premise that if the interim outcome lies in either the unfavorable or favorable zones, it is unnecessary to alter the sample size. In one case it would be risky to invest further in what appears to be a failed trial, while in the other case the trial appears slated to succeed anyway, without an additional sample size investment. Thus an adaptive sample size increase is only intended to help studies whose interim results fall in a promising zone, between these two extremes. How might these three zones be identified? One could use the interim estimate δ̂ or its standardized version z = δ̂/se(δ̂) to partition the sample space into the three zones. Alternatively one could rely on the conditional power or probability of obtaining a positive outcome at the end of the trial, given the data already observed. The conditional power approach is favored by most practitioners because it has a meaningful interpretation that is independent of the type of endpoint being measured, and incorporates both the current estimate of treatment effect as well as its standard error. Accordingly for the present trial we pre-specify that a sample size increase will only be contemplated if the conditional power at the interim look lies between 30% and 80%. That is, the unfavorable zone is characterized by conditional power values at most equal to 30%, the promising zone by conditional power values between 30% and 80% and the favorable zone by conditional power values at least equal to 80%. Computing the Required Sample Size Increase Just as at the design stage of a trial the sample size is determined by the desired power 53.2 Normal Endpoint – 53.2.4 Adaptive Sample Size Increase 1037 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes (80%, say) to detect an anticipated value of δ, so also at the time of the interim analysis the new sample size may be determined by the desired conditional power (also 80%, say) to detect an anticipated value of δ. Now, however, data from the actual trial are available, and may be used to update the anticipated value of δ at which to power the trial. One could, if desired, incorporate prior beliefs, external information and current data into a value of δ at which to power the study. For simplicity however, we shall use the estimate of δ obtained at the interim analysis to recompute the sample size needed to hit the target of 80% conditional power. It is possible that this calculation could result in a reduction in the total sample size. This is permitted by the statistical methodology of adaptive designs. For the current example, however, we do not wish to decrease the sample size. Therefore if the recomputed sample size constitutes a decrease, the original sample size of 442 subjects will be used. Specifying an Upper Limit to the Sample Size Increase Since resources are limited, there must be an upper limit to the sample size increase, no matter what sample size is required to attain 80% conditional power. This upper limit is usually restricted to between 150% and 200% of the original sample size and is pre-specified at the start of the trial. Larger sample size increases are undesirable since they could yield statistically significant outcomes that are clinically non-significant. For the current trial we pre-specify an upper limit of 884 subjects. That is, we are prepared to double our investment in the trial, but only if the interim estimate of conditional power falls in the promising zone. Finally, the design specifications of the adaptive Des 4 are as follows: 1. The initial sample size is 442 subjects, and has 80% power to detect δ = 2 with a one-sided level-0.025 test. 2. An interim analysis is performed after data are available on 208 completers with 26 weeks of follow-up data. 3. At the interim analysis the conditional power is computed using the estimated value δ̂ as though it were the true value of δ. If the conditional power lies between 30% and 80%, the interim outcome is deemed to be promising. 4. If the interim outcome is promising, the sample size is re-computed so as to achieve 80% conditional power at the estimated value, δ̂. The original sample size is then updated to the re-computed sample size, subject to the constraint in item 5 shown below. 5. If the re-computed sample size is less than 442, the original sample size of 442 subjects is used. If the re-computed sample size exceeds 884, the sample size is curtailed to 884 subjects. 53.2.5 1038 Operating Characteristics of Adaptive Design 53.2 Normal Endpoint – 53.2.5 Adaptive Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Due to the complex adaptive scheme for re-computing sample size, the operating characteristics of Des 4 can best be evaluated by simulation. Table 53.3 displays power and expected sample sizes for selected values of δ between 1.6 and 2.0, based on 100,000 simulations of Des 4. For comparative purposes, corresponding power and sample size values for Des 1 are also displayed. The power of the adaptive Des 4 has Table 53.3: Operating Characteristics of Des 1 (Fixed Sample) and Des 4 (Adaptive) Value of δ 1.6 1.7 1.8 1.9 2.0 Des 1(Fixed Sample) Power Expected Sample Size 61% 66% 71% 76% 80% 442 442 442 442 442 Des 4 (Adaptive) Power Expected Sample Size 67% 72% 76% 81% 84% 507 503 501 498 495 All Des 4 results are based on 100,000 simulated trials increased by 6% at δ = 1.6 and by 4% at δ = 2 compared to Des 1. These power gains were obtained at the cost of corresponding average sample size increases of 67 subjects at δ = 1.6 and 57 subjects at δ = 2. The gains in power appear to be fairly modest, especially as they are offset by corresponding sample size increases. However, Des 4 offers a significant benefit in terms of risk reduction, not reflected in Table 53.3. To see this it is important to note that the sample size under Des 4 is only increased when the interim results are promising; i.e., when the conditional power at the interim analysis is greater than 30% but less than 80%. This is the very situation in which it is advantageous to increase the sample size and thereby avoid an underpowered trial. When the interim results are unfavorable (conditional power ≤ 30%) or favorable (conditional power ≥ 80%), a sample size increase in not warranted and hence the sample size is unchanged at 442 subjects for both Des 1 and Des 4. But when the interim results are promising (conditional power between 30% and 80%) the sample size is increased under Des 4 in an attempt to boost the conditional power back to 80%. It is this feature of the adaptive design that makes it more attractive than the simpler fixed sample design. Table 53.4 displays the probability of falling into the following zones: unfavorable+ futility, promising and Fav + Eff at the interim look, along with the power and expected sample size, conditional on falling into each zone, under both Des 1 and Des 4. The table highlights the key advantage of the adaptive Des 4 compared to the fixed sample Des 1; i.e., the ability to invest in the trial in stages, with the second stage 53.2 Normal Endpoint – 53.2.5 Adaptive Design 1039 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes Table 53.4: Operating Characteristics of Des 1 and Des 4 Conditional on Interim Outcome δ 1.6 1.7 1.8 1.9 2.0 Interim Outcome Probability of Interim Outcome Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff 33% 27% 40% 30% 26% 45% 26% 25% 48% 23% 25% 52% 21% 24% 56% Power Conditional on Interim Outcome Des 1 Des 4 28% 61% 87% 32% 65% 89% 36% 69% 91% 41% 72% 93% 45% 76% 94% 28% 83% 88% 32% 86% 90% 35% 89% 92% 39% 91% 93% 46% 92% 95% Expected Sample Size Des 1 Des 4 442 442 442 442 442 442 442 442 442 442 442 442 442 442 442 442 696 435 442 693 435 442 691 434 442 688 434 442 685 433 All results are based on 100,000 simulated trials of the investment being required only if promising results are obtained at the first stage. This feature of Des 4 makes it far more attractive as an investment strategy than Des 1 which has no provision for increasing the sample size if a promising interim outcome is obtained. Suppose, for example that δ = 1.6, the smallest clinically meaningful treatment effect. The trial sponsor only commits the resources needed for 442 subjects at the start of the trial, at which point the chance of success is 61%, as shown in Table 53.3. The additional sample size commitment is forthcoming only if promising results are obtained at the interim analysis, and in that case the sponsor’s risk is substantially reduced because the chance of success jumps to 83%, as shown in Table 53.4. Similar results are observed for the other values of δ. The probabilities of entering the unfavorable, promising and favorable zones at the interim analysis, displayed in Table 53.4, are instructive. Consider again the case δ = 1.6. At this value of δ there is a 26% chance of landing in the promising zone and 1040 53.2 Normal Endpoint – 53.2.5 Adaptive Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 thereby obtaining a substantial power boost under Des 4 as compared to Des 1. That is, 27% of the time the adaptive strategy can rescue a trial that is underpowered at the interim look. The chance of entering the favorable zone is 40%. That is, 40% of the time the sponsor will be lucky and have a well powered trial at the interim look without the need to increase the sample size. The remaining 33% of the time the sponsor will be unlucky and will enter the unfavorable zone from which also there is no sample size increase, and the chance of success is only 28%. These odds improve with larger values of δ. 53.3 Binomial Endpoint: Acute Coronary Syndromes Trial 53.3.1 Group Sequential Design 53.3.2 Adaptive Group Sequential Design 53.3.3 Adaptive Group Sequential Design 53.3.4 Adding a Futility Boundary Consider a two-arm, placebo controlled randomized clinical trial for subjects with acute cardiovascular disease undergoing percutaneous coronary intervention (PCI). The primary endpoint is a composite of death, myocardial infarction or ischemia-driven revascularization during the first 48 hours after randomization. We assume on the basis of prior knowledge that the event rate for the placebo arm is 8.7%. The investigational drug is expected to reduce the event rate by at least 20%. The investigators are planning to randomize a total of 8000 subjects in equal proportions to the two arms of the study. It is easy to show that a conventional fixed sample design enrolling a total of 8000 subjects will have 83% power to detect a 20% risk reduction with a one-sided level-0.025 test of significance. The actual risk reduction is expected to be larger, but could also be as low as 15%, a treatment effect that would still be of clinical interest given the severity and importance of the outcomes. In addition, there is some uncertainty about the magnitude of the placebo event rate. For these reasons the investigators wish to build into the trial design some flexibility for adjusting the sample size. Two options under consideration are, a group sequential design with the possibility of early stopping in case the risk reduction is large, and an adaptive design with the possibility of increasing the sample size in case the risk reduction is small. In the remainder of this section we shall discuss these two options and show how they may be combined into a single design that captures the benefits of both. 53.3.1 Group Sequential Design We first transform the fixed sample design into an 8000 person group sequential design with two interim looks, one after 4000 subjects are enrolled (50% of total information) and the second after 5600 subjects are enrolled (70% of total information). Early stopping efficacy boundaries are derived from the Lan and DeMets (1983) O’Brien-Fleming type error spending function. Let us denote this group sequential design as GSD1. The operating characteristics of GSD1 are displayed in Table 53.5. The first column of Table 53.5 is a list of potential risk reductions, defined as 100 × (1 − ρ)% where ρ = πt /πc , πt is the event rate for the treatment arm, and πc is the event rate for the control arm. The remaining columns display early stopping probabilities, power and expected sample size. Since the endpoint is observed within 53.3 Binomial Endpoint – 53.3.1 Group Sequential Design 1041 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes 48 hours, the problem of overruns that we encountered in the schizophrenia trial is negligible and may be ignored. Table 53.5: Operating Characteristics of GSD1, a Three-Look 8000-Person Group Sequential Design Risk Reduction 100 × (1 − ρ) 15% 17% 20% 23% 25% Probability of Crossing Efficacy Boundary At Look 1 At Look 2 At Final Look (N = 4000) (N = 5600) (N = 8000) 0.074 0.109 0.181 0.279 0.357 0.183 0.235 0.310 0.362 0.376 0.309 0.335 0.330 0.275 0.222 Overall Power Expected Sample Size 57% 68% 82% 92% 96% 7264 7002 6535 6017 5671 Table 53.5 shows that GSD1 is well powered, with large savings of expected sample size for risk reductions of 20% or more. It is thus a satisfactory design if, as is initially believed, the magnitude of the risk reduction is in the range 20% to 25%. This design does not, however, offer as good protection against a false negative conclusion for smaller risk reductions. In particular, even though 15% is still a clinically meaningful risk reduction, GSD1offers only 57% power to detect this treatment effect. One possibility then is to increase the up-front sample size commitment of the group sequential design so that it has 80% power if the risk reduction is 15%. This leads to GSD2, a three-look group sequential design with a maximum sample size commitment of 13,853 subjects, one interim look after 6926 subjects (50% of total information) and a second interim look after 9697 subjects (70% of total information). GSD2 has 80% power to detect a risk reduction of 15% with a one-sided level-0.025 test. Table 53.6 displays operating characteristics of GSD2 for risk reductions between 15%, and 25%. Notice that by attempting to provide adequate power at 15% risk reduction, the low end of clinically meaningful treatment effects, we have significantly over-powered the trial for values of risk reduction in the expected range of risk reductions, 20% to 25% . If, as expected, the risk reduction exceeds 20%, the large up-front sample size commitment of 13,853 subjects under GSD2 is unnecessary. GSD1 with an up-front commitment of only 8000 subjects will provide sufficient power in this setting. From this point of view, GSD2 is not a very satisfactory design. It commits the investigators to a very large and expensive trial in order to provide adequate power in 1042 53.3 Binomial Endpoint – 53.3.1 Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 53.6: Operating Characteristics of GSD2, a Three-Look 13,853-Person Grp Sequential Design Risk Probability of Crossing Efficacy Boundary Expected Reduction At Look 1 At Look 2 At Final Look Overall Sample 100 × (1 − ρ) (N = 6926) (N = 9697) (N = 13, 853) Power Size 15% 17% 20% 23% 25% 0.167 0.246 0.395 0.565 0.675 0. 298 0.349 0.375 0.329 0.269 0.335 0.296 0.196 0.099 0.054 80% 89% 97% 99.3% 99.8% 11,456 10,699 9558 8574 8061 the pessimistic range of risk reductions, without any evidence that the true risk reduction does indeed lie in the pessimistic range. Evidently a single group sequential design cannot provide adequate power for the ”worst-case” scenario, and at the same time avoid overpowering the more optimistic range of scenarios. This leads us to consider building an adaptive sample size re-estimation option into the group sequential design GSD1, such that the adaptive component will provide the necessary insurance for the worst-case scenario, and thereby free the group sequential component to provide adequate power for the expected scenario, without a large and unnecessary up-front sample size commitment. 53.3.2 Adaptive Group Sequential Design We convert the three-look group sequential design GSD1 into an adaptive group sequential design by inserting into it the option to increase the sample size at look 2, when 5600 subjects have been enrolled. Denote the modified design by A-GSD1. The rules governing the sample size increase for A-GSD1 are similar to the rules specified in Section 53.2.4 for the schizophrenia trial, but tailored to the needs of the current trial. The idea is to identify unfavorable, promising and favorable zones for the interim results at look 2, based on the attained conditional power. Sample size should only be increased if the interim results fall in the promising zone. Subject to an upper limit, the sample size should be increased by just the right amount to boost the current conditional power to some desired level (say 80%). The following are the design specifications for A-GSD1: 1. The starting design is GSD1 with a sample size of 8000 subjects, one interim look after enrolling 4000 subjects and a second interim look after enrolling 5600 subjects. The efficacy stopping boundaries at these two interim looks are derived from the Lan and DeMets (1983) error spending function of the 53.3 Binomial Endpoint – 53.3.2 Adaptive Group Sequential Design 1043 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes O’Brien-Fleming type. 2. At the second interim analysis, with data available on 5600 subjects, the conditional power is computed using the estimated value ρ̂ as though it were the true relative risk ρ. If the conditional power is no greater than 30% the outcome is deemed to be unfavorable. If the conditional power is between 30% and 80%, the outcome is deemed to be promising. If the conditional power is at least 80%, the outcome is deemed to be favorable. 3. If the interim outcome is promising, the sample size is re-computed so as to achieve 80% conditional power at the estimated value ρ̂. The original sample size is then updated to the re-computed sample size, subject to the constraint in item 4 shown below. 4. If the re-computed sample size is less than 8000, the original sample size of 8000 subjects is used. If the re-computed sample size exceeds 16,000, the sample size is curtailed at 16,000 subjects . Some features of this adaptive strategy are worth pointing out. First, the sample size is re-computed on the basis of data from 5600 subjects from the trial itself. Therefore the estimate of ρ available at the interim analysis is substantially more reliable than the estimate that was used at the start of the trial to compute an initial sample size of 8000 subjects. The latter estimate is typically derived from smaller pilot studies or from other phase 3 studies in which the patient population might not be exactly the same as that of the current trial. Second, a sample size increase is only requested if the interim results are promising, in which case the trial sponsor should be willing to invest the additional resources needed to power the trial adequately. In contrast GSD2 increases the sample size substantially at the very beginning of the trial, before any data are available to determine if the large sample size is justified. 53.3.3 Operating Characteristics of Adaptive Group Sequential Design Table 53.7 displays the power and expected sample size of the adaptive group sequential design A-GSD1. For comparative purposes corresponding power and sample size values of GSD1 are also provided. If there is a 15% risk reduction, A-GSD1 has 6% more power than GSD1 but utilizes an additional 1093 subjects on average. It is seen that as the risk reduction parameter increases the power advantage and additional sample size requirement of A-GSD1 are reduced relative to GSD1. The power and sample size entries in Table 53.7 were computed unconditionally, and for that reason do not reveal the real benefit that design A-GSD1 offers compared to design GSD1. As discussed previously in the schizophrenia example, the real benefit of an adaptive design is the opportunity it provides to invest in the trial in stages with 1044 53.3 Binomial Endpoint – 53.3.3 Adaptive Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 53.7: Operating Characteristics of GSD1 (Group Sequential) and A-GSD1 (Adaptive Group Sequential) Designs Risk Reduction GSD1 (Group Sequential) A-GSD1 (Adaptive Group Sequential) 100 × (1 − ρ) Power Expected Sample Size Power Expected Sample Size 15% 17% 20% 23% 25% 57% 7264 62% 8253 68% 7002 73% 7945 82% 6535 86% 7294 92% 6017 94% 6531 96% 5671 97% 6036 All results for A-GSD1 are based on 100,000 simulated trials the second stage investment forthcoming only if promising results are obtained at the first stage. To explain this better it is necessary to display power and expected sample size results conditional on the zone (unfavorable, promising or favorable) into which the results of the trial fall at the second interim analysis. Accordingly Table 53.8 displays the operating characteristics of both GSD1 and A-GSD1 conditional on the zone into which the conditional power falls at the second interim analysis. The table reveals substantial gains in power for A-GSD1 compared to GSD1 at all values of risk reduction if the second interim outcome falls in the promising zone, thereby leading to an increase in the sample size. Outside this zone the two designs have the same operating characteristics since the sample size does not change. If the second interim outcome falls in the unfavorable zone, the trial appears to be headed for failure and an additional sample size investment would be risky. If the second interim outcome falls in the favorable zone, the trial is headed for success without the need to increase the sample size. Thus the adaptive design provides the opportunity to increase the sample size only when the results of the second interim analysis fall in the promising zone. This is precisely when the trial can most benefit from a sample size increase. 53.3.4 Adding a Futility Boundary One concern with design A-GSD1 is that it lacks a futility boundary. There is thus the risk of proceeding to the end, possibly with a sample size increase, when the magnitude of the risk reduction is small and unlikely to result in a successful trial. In particular, suppose that the null hypothesis is true. In that case we can show that the power (i.e., the type-1 error) is 2.5% and the expected sample size under A-GSD1 is 8253 subjects. It might thus be desirable to include some type of futility stopping rule for the trial. In this trial the investigators proposed the following futility stopping rules at the two interim analysis time points: 53.3 Binomial Endpoint – 53.3.4 Adding a Futility Boundary 1045 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes Table 53.8: Operating Characteristics of GSD1 (Group Sequential) and A-GSD1 (Adaptive Group Sequential) Designs Conditional on Second Interim Outcome Risk Reduction 100 × (1 − ρ) 15% 17% 20% 23% 25% Second Interim Outcome Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Probability Power Conditional on Expected of Interim Second Interim Outcome Sample Size Outcome GSD1 A-GSD1 GSD1 A-GSD1 36% 24% 40% 27% 24 % 49 % 16% 20% 64% 9% 14% 77% 5% 11% 85% 15% 57% 94% 19% 64% 96% 29% 73% 98% 40% 81% 99% 48% 85% 99.6% 15% 81% 94% 20% 87% 96% 30% 93% 98 % 40% 96% 99% 48% 98% 99.5% 8000 8000 6152 8000 8000 5992 8000 8000 5721 8000 8000 5440 8000 8000 5250 8000 12099 6152 8000 11956 5992 8000 11780 5726 8000 11606 5440 8000 11449 5247 All results are based on 100,000 simulated trials 1. Stop for futility at the first interim analysis (N = 4000) if the estimated event rate for the experimental arm is at least 1% higher than the estimated event rate for the control arm 2. Stop for futility at the second interim analysis (N = 5600) if the conditional power, based on the estimated risk ratio ρ̂, is no greater than 20% The impact of the futility boundary on the unconditional operating characteristics of the A-GSD1 design are displayed in Table 53.9. The inclusion of the futility boundary has resulted in a dramatic saving of nearly 3000 subjects, on average, at the null hypothesis of no risk reduction. Furthermore, notwithstanding a small power loss of 2-3%, the trial continues to have well over 80% power for risk reductions of 20% or more. The trial suffers a power loss of 4% if the magnitude of the risk reduction is 15%, the low end of the range of clinical interest. In this situation, however, the unconditional power is inadequate (only 63%) even without a futility boundary. To 1046 53.3 Binomial Endpoint – 53.3.4 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 53.9: Operating Characteristics of the A-GSD1 Design with and without a Futility Boundary Risk Reduction A-GSD1 with No Futility Boundary A-GSD1 with Futility Boundary 100 × (1 − ρ) Power Expected Sample Size Power Expected Sample Size 0% 15% 20% 25% 2.4% 8260 2.1% 63% 8253 57% 86% 7294 81% 97% 6036 94% All results are based on 100,000 simulated trials 4866 7063 6726 5846 fully appreciate the impact of the futility boundary on power and expected sample size, it is necessary to study the operating characteristics of the trial conditional on the results of the second interim analysis. These results are displayed in Table 53.10. 53.3 Binomial Endpoint – 53.3.4 Adding a Futility Boundary 1047 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes Table 53.10: Operating Characteristics of A-GSD1 Design with and without a Futility Boundary, Conditional on the Second Interim Outcome Risk Reduction 100 × (1 − ρ) 0% 15% 20% 25% Second Interim Outcome Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Probability Power Conditional on Expected of Second Interim Outcome Sample Size Interim Outcome No Fut With Fut No Fut With Fut 93% 5% 2% 38% 23 % 39 % 18% 18% 64% 6% 10% 84% 0.5% 15% 65% 15% 81% 94% 30% 93% 98% 48% 98% 99.5% 0.1% 15% 64% 5% 81% 94% 9% 92% 98% 14% 97% 99.5% 8000 13030 7017 8000 12099 5992 8000 11780 5726 8000 11449 5247 4370 12916 6928 5093 11950 6152 5264 11670 5711 5354 11370 5274 All results are based on 100,000 simulated trials It is seen that the presence of the futility boundary does not cause any loss of power for trials that enter the promising or favorable zones at the second interim analysis. Additionally the presence of the futility boundary causes the average sample size to be reduced substantially in the unfavorable zone while remaining the same in the other two zones. In effect the futility boundary terminates a proportion of trials that enter the unfavorable zone thereby preventing them from proceeding to conclusion. It has no impact on trials that enter the promising or favorable zones. 1048 53.3 Binomial Endpoint – 53.3.4 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 53.4 Survival Endpoint: Lung Cancer Trial A two-arm multi-center randomized clinical trial is planned for subjects with advanced metastatic non-small cell lung cancer with the goal of comparing the current standard second line therapy (docetaxel+cisplatin) to a new docetaxel containing combination regimen. The primary endpoint is overall survival (OS). The study is required to have one-sided α = 0.025, and 90% power to detect an improvement in median survival, from 8 months on the control arm to 11.4 months on the experimental arm, which corresponds to a hazard ratio of 0.7. A group sequential design is adopted with an efficacy boundary derived from the Lan and DeMets (1983) O’Brien-Fleming type spending function and a futility boundary derived from the γ-spending function of Hwang, Shih and DeCani (1990) with parameter γ = −5. It is decided, with the help of the East software, to keep the study open for a maximum of 334 OS events, with one interim analysis after 167 events (50% of the total information), whereby a 1-sided level-0.025 group sequential logrank test will have 90% power to detect a hazard ratio of 0.7. As this is an event-driven trial, sample size does not play a direct role in the above power calculation. Nevertheless the rate of accrual, duration of accrual and duration of follow-up will affect the total study duration or time needed to obtain 334 events. Again, with the help of East, it is determined that by enrolling 483 subjects over a two year period and following them for an additional 6 months, the required 334 OS events can be expected to arrive by the end of the follow-up period. Now the assumption of 8 months for median survival on the control arm is based on published results from a previously completed large, well-controlled trial. There is less data available on the experimental arm. It is thus possible, either because the new treatment is somewhat less effective than anticipated or because of improved standard of care for patients on the control arm, that the underlying hazard ratio could be larger than 0.7. If this were the case, the study would be underpowered. For example, if the true hazard ratio was 0.77, an effect that is still considered clinically meaningful, the power of a 483-subject study would drop from 90% to 67.2%. Thus one possibility would be to design the trial from the very beginning to have 90% power to detect a hazard ratio of 0.77. East shows that such a trial would require 621 events. In order to complete the trial in 30 months it would be necessary to enroll 878 subjects over 24 months with an additional 6 months of follow-up. The sponsor is either unable or unwilling to make such a large sample size commitment up-front purely on the basis of the limited prior data available on the new compound. However, since an independent data monitoring committee (DMC) will be reviewing the interim efficacy data in an unblinded fashion at 50% of the total information, the sponsor might be prepared to authorize the investment of additional resources on the recommendation of this committee. In a manner analogous to the pre-specification of group sequential boundaries for early stopping, the sponsor must pre-specify to the DMC the precise data dependent rules for increasing the number of 53.4 Survival Endpoint 1049 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes events and sample size at the time of the interim analysis. These rules follow the same basic structure as was adopted in Section 53.2 for the schizophrenia trial and in Section 53.3 for the acute coronary syndromes trial. The sample space of possible interim outcomes is partitioned into three zones; unfavorable/futility, promising, favorable/efficacy. The partitioning utilizes conditional power (CP) evaluated at the current estimate of hazard ratio with the initial specification of 334 events. The promising zone is of the form CPmin ≤ CP < CPmax . To the left of the promising zone lies the unfavorable/futility zone while to the right of the promising zone lies the favorable/efficacy zone. If the data fall in the promising zone the number of events and sample size are increased by a pre-specified formula that is written into the DMC charter. If the interim data fall in the unfavorable/futility zone there is either no change in the initial design or an early termination because the futility boundary is crossed. Similarly if the interim data fall in the favorable/efficacy zone, there is either no change in the initial design or an early termination because the efficacy boundary is crossed. The choice of CPmin , CPmax and the rules for increasing resources in the promising zone require are best determined with the help of the simulation tools available in East. In Chapter 54, Section 54.5.3 we demonstrate how the EastSurvAdapt module of East may be used to simulate different criteria for increasing event and sample size resources and thereby obtain an adaptive design that best satisfies the goals of the trial within the resource constraints imposed on the sponsor. Based on these simulation results it has been decided to implement an adaptive increase in the number of events by 50% (from 334 to 501) if the interim results fall in the promising zone, here defined as conditional power between CPmin = 30% and CPmax = 90%. It has further been decided that the sample size will be increased in the same ratio as the increase in events. The operating characteristics of the lung cancer trial are displayed in Tables 53.11 and 53.12 and 53.13 for underlying hazard ratios of 0.77, 0.75 and 0.70 respectively. In each table the classical group sequential design and the adaptive group sequential design are compared with respect to power, average study duration and average number of subjects. 1050 53.4 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 53.11: Operating Characteristics of Optimistic Design (powered to Detect HR=0.70) under the Pessimistic Scenario (true HR=0.77) 10,000 Simulations Under the Pessimistic Scenario that HR = 0.77 Power Duration (months) # of Subjects Zone P(Zone) NonAdpt Adapt NonAdpt Adapt NonAdpt Adapt Unf+Fut 30% 29% 29% 27.8 27.82 469 468 Prom 34% 69% 85% 29.2 31.03 483 712 Fav+Effic 36% 92% 94% 26.2 26.18 450 451 Total — 66% 71% 27.7 28.713 467 548 Table 53.12: Operating Characteristics of Optimistic Design (powered to Detect HR=0.70) under the Semi-Pessimistic Scenario (true HR=0.75) 10,000 Simulations Under the Semi-Pessimistic Scenario that HR = 0.75 Power Duration (months) # of Subjects Zone P(Zone) NonAdpt Adapt NonAdpt Adapt NonAdpt Adapt Unf+Fut 24% 35% 36% 28.2 28.3 471 471 Prom 34% 73% 89% 29.4 32.1 483 712 Fav+Effic 42% 96% 95% 25.8 25.9 446 446 Total — 74% 79% 27.6 28.6 465 542 The results follow a similar pattern to what was observed in the previous two examples. Let us focus first on the simulation results when the underlying hazard ratio is 0.77. This is the setting where the adaptive design can play an important role since a hazard ratio of 0.77 is still clinically meaningful, and yet the sponsor is unable to command the resources that would be required to guarantee 90% power with a non-adaptive design. Row 4 of Table 53.11 shows that the adaptive design produces about a 6% gain in overall power, from 66% to 71%, at an average cost of about 1 additional month of study duration and 81 additional subjects. The real appeal of the adaptive design, however, is more evident when the overall simulation results are partitioned into the three zones. It is then seen from Table 53.11 that the interim outcome will fall in the unfavorable/futility zone 30% of the time, in which case the prospects for a successful trial are equally bleak for both the classical and adaptive designs, but no additional resources are committed to the adaptive trial. The interim outcome will fall in the favorable/efficacy zone 36% of the time, in which case the 53.4 Survival Endpoint 1051 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes Table 53.13: Operating Characteristics of Optimistic Design (powered to Detect HR=0.70) under the Optimistic Scenario (true HR=0.70) 10,000 Simulations Under the Optimistic Scenario that HR = 0.70 Power Duration (months) # of Subjects Zone P(Zone) NonAdpt Adapt NonAdpt Adapt NonAdpt Adapt Unf+Fut 12% 54% 56% 28.9 29.1 475 476 Prom 27% 87% 97% 29.8 32.4 483 712 Fav+Effic 61% 98% 98% 25.1 25.1 436 436 Total — 90% 93% 26.9 27.6 454 516 prospects are excellent for both the classical and adaptive designs, and again, no additional resources are committed to the adaptive trial. The remaining 34% of the time the interim outcome will fall in the promising zone and this is where the adaptive design will help by boosting up the power from 69% to 85%. To be sure the study duration and sample size will also increased in the promising zone. Presumably, however, the power gain justifies the use of these additional resources. In summary, the additional resources will be called up to boost power only if they can make a difference to the chance of a successful outcome for the trial. Table 53.12 demonstrates that these results are similarly compelling if the true hazard ratio is 0.75. If, however, the true hazard ratio is 0.7 Table 53.13 shows that the trial as initially designed has adequate power without the need for any adaptation of events or sample size. There is now a 27% chance of landing in the promising zone and adding resources in order to boost power from 87% to 97%. In this setting the trial would be overpowered and some of the additional resources might not have been needed. The sponsor cannot of course know what the true hazard ratio is, and must weigh the likelihood of incurring these additional costs against the possibility of a loss to the patient population, and also a financial loss to the sponsor, if the study should fail despite the treatment difference being clinically meaningful. 53.5 1052 Concluding Remarks Many small companies with new molecules or technologies under development often rely on outside investors or large pharmaceutical companies for financing their phase 3 trials. The two-stage nature of the investment, with the second installment being obligated only if the interim results have significantly increased the odds of success, might make the adaptive design more attractive to outside investors than a conventional design requiring a fixed investment up-front, even when the two designs have an equivalent unconditional risk profiles. Simulations, performed prior to starting the 53.5 Concluding Remarks <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 trial, are necessary to quantify the risks and benefits involved in selecting an adaptive design in preference to a conventional fixed sample or group sequential design, and to enable the sponsor to make an informed decision. A major additional benefit of the adaptive approach is flexibility. The adaptive methodology controls the type-1 error even if the pre-specified criteria for increasing the sample size are overruled at the interim analysis. This might be desirable for a variety of reasons both internal and external to the current trial. For example, in addition to observing a promising outcome at the interim time analysis, the safety profile for the test drug might turn out to be far superior to what was originally anticipated, and this might make the new drug more competitive in the marketplace. One could therefore justify increasing the sample size by a larger amount than that determined by the pre-specified rules, and thereby further reduce the chances of a false negative outcome. Another possible situation in which one might overrule the pre-specified criteria for sample size change would be if compelling results from other clinical trials on comparable populations, treated with the same class of drugs became available and caused the sponsor to revise the value of δ at which to power the current study. Ideally one would wish to adhere strictly to the pre-specified criteria for sample size change since the operating characteristics of the design would change if they were overruled. This would certainly be the preference of regulatory authorities. As a practical matter, however, it is not possible to anticipate every contingency under which a sample size change is desirable. It is a strength of the adaptive approach that the validity of the statistical test at the end of the trial is not affected by unanticipated developments arising over the course of the clinical trial that necessitate making changes to the pre-specified criteria for sample size adaptation. Adaptive trials require very careful up-front planning. An independent interim analysis review committee (IARC) must be appointed with the responsibility to actually implement the adaptive decision rules. A charter listing the members of the IARC, describing their roles and responsibilities, and providing the details of the proposed adaptations must be created. The charter should also discuss the steps that will be taken to ensure that the interim results remain confidential, as premature disclosure of interim results to the trial investigators could compromise the trial. Finally, regulatory approval must be secured in advance through a special protocol assessment (SPA). For this purpose the sponsor is required to submit the protocol, the charter and the simulations backing up the statistical validity of the proposed adaptive approach in good time. Logistical and operational issues must also be considered. In a fixed sample study the total sample size is determined in advance. In a traditional group sequential study, the 53.5 Concluding Remarks 1053 <<< Contents 53 * Index >>> The Motivation for Adaptive Sample Size Changes maximum sample size is determined in advance. In an adaptive study, however, the maximum sample size might be increased at an interim look thereby further complicating the management of the trial, especially as it relates to patient recruitment and drug supply. Because of all these complexities an adaptive design might not always be the right choice. The more established fixed sample and group sequential designs should always be evaluated alongside an adaptive design. Simulations play a crucial role in understanding the operating characteristics of an adaptive design and deciding whether it is an appropriate choice for the trial under consideration. There should be a tangible, quantifiable benefit arising from the decision to take the adaptive route. 1054 53.5 Concluding Remarks <<< Contents * Index >>> 54 The Cui, Hung and Wang Method This chapter discusses the Cui, Hung and Wang (1999) (CHW) method for adaptive sample size modification of an on-going two-arm, K-look group sequential clinical trial. The method is based on making a sample size modification, if required, each time that an interim analysis is performed. The interim monitoring continues in this way until either a boundary is crossed or the K looks are exhausted. Since the changes to the sample size may be based on unblinded analyses of the accruing data, the test statistic is not the usual Wald statistic utilized for monitoring a conventional group sequential design. Instead the test statistic comprises of a weighted sum of incremental Wald statistics with weights that are pre-specified and chosen appropriately so as to preserve the type-1 error. This test statistic was proposed independently by Cui, Hung and Wang (1999) and by Lehmacher and Wassmer (1999). We shall refer to this test statistic as the CHW statistic and to this method of making adaptive sample size modifications as the CHW method. The CHW method is only valid for adaptive designs involving data dependent alterations in the sample size. The operating characteristics of any CHW design are obtained through simulation using a special Sample Size Re-estimation tab. Interim monitoring is performed through a special CHW Interim Monitoring dashboard. In Section 54.1, we provide a quick review of the underlying theory for normal and binomial endpoints. In Section 54.2 we show how these same results can be extended for trials with survival or time-to-event endpoints. (Hereafter we shall use the terms ”survival” or ”time-to-event” synonymously.) In Section 54.3, we illustrate the method for a normal endpoint adaptive design. In Section 54.4 we illustrate the method for a binomial endpoint adaptive design. In Section 54.5 we illustrate the method for a time-to-event adaptive design. These three designs were discussed at length in Chapter 53. Here we illustrate how to use the adaptive modules of East to simulate and monitor them. As already stated in the introductory chapter to this volume, we provide R R R two such adaptive packages, East Adapt and East SurvAdapt. The East Adapt package is required for studies with normal or binomial endpoints while the R East SurvAdapt package is required for studies with time-to-event endpoints. Since these packagesdo not function independently of East we will refer to the software as ”East” rather than ”EastAdapt” or ”EastSurvAdapt” throughout this volume. The context will clarify which adaptive module must be available to the core East program in order to run a specific example. 1055 <<< Contents 54 54.1 * Index >>> The Cui, Hung and Wang Method Statistical Method: Normal and Binomial 54.1.1 54.1.2 54.1.3 54.1.4 Hypothesis Testing RCI’s and RPV’s Conditional Power East Defaults In this section, we discuss hypothesis testing, confidence interval and p-value estimation, and computation of conditional power for a group sequential trial that permits adaptive sample size changes at the interim looks. 54.1.1 Hypothesis Testing Consider a level-α test of the null hypothesis H0 : δ = 0 versus the two-sided alternative hypothesis H1 : δ 6= 0 for a two-arm randomized clinical trial. Before any data are obtained we pre-specify that this hypothesis will be tested by a group sequential trial designed for up to K looks, at cumulative sample sizes n1 , n2 , . . . nK , with corresponding level-α stopping boundaries b1 , b2 , . . . bK derived from some spending function. Although we have pre-specified the initial sample sizes look by look, there is full flexibility to adapt based on either external information, information from the trial itself, or a combination of the two. Accordingly let n∗1 , n∗2 , . . . n∗K denote the altered cumulative sample sizes at the K looks after sample size adaptation. For adaptive designs it is convenient to express sample sizes, and parameter estimates that depend on sample size, in terms of incremental quantities as well as cumulative ones. Thus, for j = 1, 2, . . . K we define n(j) = nj − nj−1 and n∗(j) = n∗j − n∗j−1 to be the incremental sample sizes for the pre-specified and altered designs, respectively, with n0 = n∗0 = 0. In keeping with this notation, we will hereafter index all statistics computed from cumulative sample sizes with subscripts and all statistics computed from incremental sample sizes with superscripts. Additionally we will assign a superscript “∗ ” to all statistics that are computed with the altered sample sizes n∗j , j = 1, 2, . . . K, rather than the pre-specified sample sizes nj , j = 1, 2, . . . K. Suppose we are at look j. Denote the j incremental Wald statistics by Z ∗(l) = δ̂ ∗(l) se(δ̂ ∗(l) ) = δ̂ ∗(l) p I ∗(l) , l = 1, 2, . . . j, (54.1) where δ̂ ∗(l) and I ∗(l) = [se(δ̂ ∗(l) )]−2 are, respectively, the point estimate and Fisher information about δ based only on data from the incremental n∗(l) observations obtained between look (l − 1) and look l. The CHW statistic at look j, sometimes referred to as the weighted statistic, is constructed by combining these incremental Wald statistics with the pre-specified weights 1056 54.1 Statistical Method: Normal and Binomial – 54.1.1 Hypothesis Testing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 w(l) = as shown below: Zj,∗ chw √ = n(l) nK l = 1, 2, . . . j √ √ w(1) Z ∗(1) + w(2) Z ∗(2) + . . . + w(j) Z ∗(j) √ . w(1) + w(2) + . . . + w(j) (54.2) This statistic is asymptotically normally distributed with mean Pj √ δ l=1 w(l) I ∗(l) ∗ qP E(Zj,chw ) = j (l) l=1 w and unit variance. Interim monitoring proceeds just as it would in a conventional group sequential trial, and with the same stopping boundaries. The null hypothesis is rejected at the first look j which is such that |Zj,∗ chw | ≥ bj . Both Cui, Hung and Wang (1999) and Lehmacher and Wassmer (1999) have shown that the CHW statistic preserves the type-1 error despite the data dependent changes in the sample sizes at the interim looks. That is, P0 ( K [ |Zj,∗ chw | ≥ bj ) = α . j=1 Now consider the conventional Wald statistic Zj,∗ wald = δ̂j∗ se(δ̂j∗ ) = δ̂j q Ij∗ , (54.3) where δ̂j∗ and Ij∗ = [se(δ̂j )]−2 are, respectively, the point estimate and Fisher information about δ based on data from all the n∗j observation obtained up to and including look j. Because of the data dependent changes in sample size at each stage of the trial, the type-1 error may not be preserved; in general, P0 ( K [ |Zj,∗ wald | ≥ bj ) 6= α . j=1 The conventional Wald statistic (54.3) is sometimes referred to as the unweighted statistic. This is really a misnomer because we can represent (54.3) at any look j as a weighted sum of j incremental Wald statistics (54.1) using weights w∗(l) = n∗(l) , n∗K l = 1, 2, . . . j, 54.1 Statistical Method: Normal and Binomial – 54.1.1 Hypothesis Testing 1057 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method that depend on the actual rather than the pre-specified sample sizes as shown below: √ √ √ w∗(1) Z ∗(1) + w∗(2) Z ∗(2) + . . . + w∗(j) Z ∗(j) ∗ √ Zj,wald = . (54.4) w∗(1) + w∗(2) + . . . + w∗(j) The statistics (54.3) and (54.4) are functionally equivalent for the normally distributed data with known variance. In all other settings the two statistics are asymptotically equivalent. It follows that if there is no sample size change one may use either the unweighted or weighted statistic for the interim monitoring since, in that case w(j) = w∗(j) for all j, and hence Zj,∗ chw = Zj,∗ wald for all j, and P0 ( K [ j=1 |Zj,∗ chw | ≥ bj ) = P0 ( K [ |Zj,∗ wald | ≥ bj ) = α . j=1 Although the above hypothesis testing procedure was described for two-sided tests with symmetric boundaries, the modifications to accommodate two-sided tests with asymmetric boundaries and one-sided tests with or without futility boundaries is straightforward. 54.1.2 Repeated Confidence Intervals and Repeated P-Values The confidence intervals and p-values described in this section are generalizations of the repeated confidence intervals (RCI’s) and repeated p-values (RVP’s) discussed by Jennison and Turnbull (2000, Chapter 9) for classical group sequential designs. The extension to the adaptive setting is discussed in Lehmacher and Wassmer (1999) and more generally in Mehta, Bauer, Posch and Brannath (2007). All the RCI’s and RPV’s in this chapter utilize the method of Lehmacher and Wassmer (1999). Like the CHW method with which they are associated, these RCI’s and RPV’s are only valid for adaptive changes in the sample size. They are not applicable if additional adaptive changes are made to the initial design, such as data dependent changes to the number and spacing of the interim looks, or changes to the error spending function. Lehmacher and Wassmer (1999) have shown that the K RCI’s for δ are given by √ (Zj,∗ chw ± bj ) sj (54.5) Pj √ (l) ∗(l) , j = 1, 2, . . . K l=1 w I where sj = nj/nK is the information fraction at look j based on the pre-specified sample sizes, are repeated confidence intervals ( RCI’s). Thus, if δ0 is the true value of 1058 54.1 Statistical Method: Normal and Binomial – 54.1.2 RCI’s and RPV’s <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 δ then, for all j = 1, 2, . . . K, ( j √ √ !) (Zi,∗ chw + bi ) si \ (Zi,∗ chw − bi ) si Pδ0 . Pi √ (l) ∗(l) ≤ δ0 ≤ Pi √ (l) ∗(l) i=1 l=1 w I l=1 w I (54.6) Following the development in Jennison and Turnbull (2000, page 202) we can use (54.6) to obtain a repeated p-value at any look j. This is accomplished by iteratively altering the significance level of the hypothesis test for δ until a level p̃j , say, is obtained such that one of the two extremes of the corresponding RCI (54.5), with confidence coefficient 1 − pj , just excludes zero. To be specific, let bj (q), j = 1, 2, . . . K represent any level-q two-sided stopping boundaries derived from some spending function. That is, P0 ( K [ |Zj,∗ chw | ≥ bj (q)) = q j=1 ∗ ∗ Let zj, chw be the observed value of Zj,chw at look j. Then the two-sided repeated p-value at look j is the probability p̃j that satisfies the relationship ∗ zj, chw − bj (p̃j ) = 0 if δ̂j ≥ 0 (54.7) ∗ zj, chw δ̂j < 0 . (54.8) + bj (p̃j ) = 0 if These results can be readily modified to accommodate two-sided asymmetric tests and one-sided tests with or without futility boundaries. Suppose, for example that we have obtained the asymmetric two-sided boundaries (aj , bj ), j = 1, 2, . . . K such that (j−1 ) [ P0 (ai < Zi,∗ chw < bi ) ∩ (Zj,∗ chw ≤ aj ), j = 1, 2, . . . K = αl i=1 and (j−1 ) [ ∗ ∗ P0 (ai < Zi,chw < bi ) ∩ (Zj,chw ≥ bj ), j = 1, 2, . . . K = αu i=1 where αl + αu = α. Then the K repeated confidence intervals for δ are given by " ∗ √ √ # (Zj,chw − bj ) sj , (Zj,∗ chw − aj ) sj , j = 1, 2, . . . K Pj √ (l) ∗(l) , Pj √ (l) ∗(l) l=1 w I l=1 w I To compute the two-sided repeated p-values let q be any probability and let (aj (q), bj (q)), j = 1, 2, . . . K, be any level-q asymmetric two-sided stopping 54.1 Statistical Method: Normal and Binomial – 54.1.2 RCI’s and RPV’s 1059 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method boundaries derived from a pair of level-q asymmetric spending functions (αl (q), αu (q)) which have the same functional form as the spending functions used in the original asymmetric level-α trial design. That is, (j−1 ) [ ∗ ∗ P0 (ai (q) < Zi,chw < bi (q)) ∩ (Zj,chw ≤ aj (q)), j = 1, 2, . . . K = αl (q) i=1 and P0 (j−1 [ ) (ai (q) < Zi,∗ chw < bi (q)) ∩ (Zj,∗ chw ≥ bj (q)), j = 1, 2, . . . K = αu (q) i=1 Note: In this notation, if q = α, then αl (q) = αl , αu (q) = αu , αl (q) + αu (q) = αl + αu = α, and (aj (q), bj (q)) = (aj , bj ), j = 1, 2, . . . K. ∗ ∗ Let zj, chw be the observed value of Zj,chw at look j. The two-sided repeated p-value at look j is the probability p̃j that satisfies the relationship 54.1.3 ∗ zj, chw − aj (p̃j ) = 0 if δ̂j ≥ 0 (54.9) ∗ zj, chw + bj (p̃j ) = 0 if δ̂j < 0 . (54.10) Conditional Power Suppose that an on-going trial has reached some interim look L < K, and the ∗ observed value of the CHW test statistic is ZL, chw = zL . Having examined the data so far obtained, suppose it is planned to proceed through the remaining stages of the trial with cumulative sample sizes n∗L+1 , n∗L+2 , . . . n∗K that are possibly different than the cumulative sample sizes nL+1 , nL+2 , . . . nK pre-specified at the start of the trial. We define the conditional power at look L as the probability of attaining statistical significance in the direction of the alternative hypothesis at any future look, given z(L) . Thus, if we are testing the null hypothesis that δ = 0 against the alternative that δ > 0, the conditional power is defined as CPδ (zL ) = Pδ { K [ (Zj,∗ chw ≥ bj |zL )} (54.11) j=L+1 whereas if the alternative hypothesis is that δ < 0, then the conditional power is defined as K [ CPδ (zL ) = Pδ { (Zj,∗ chw ≤ bj |zL )} . (54.12) j=L+1 1060 54.1 Statistical Method: Normal and Binomial – 54.1.3 Conditional Power <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For two sided tests the conditional power is given by CPδ (zL ) = Pδ { K [ (|Zj,∗ chw | ≥ bj |zL )} (54.13) j=L+1 These probabilities are obtained by recursive integration in East. Special East calculators are available from within the CHW interim monitoring dashboard and the CHW adaptive simulations to compute the conditional power for any specified value of δ. Use of these calculators will be demonstrated in the worked examples that form part of this chapter as well as in a separate chapter of the current user manual. We conclude this section with some additional remarks about conditional power: Although equations (54.11) through (54.13) are expressed in terms of δ, their dependence on σ for normally distributed data is implicit through the expression (54.2) for Zj,∗ chw . In fact for the normal case one can show that conditional power depends only on the ratio δ/σ. By increasing the sample size of the remainder of the trial after look L one increases the conditional power. The calculators in East can be used to determine the magnitude of the sample size increase that is needed to achieve any desirable conditional power, for any assumed value of δ (and σ). Each simulation performed in the CHW adaptive simulations implements a one-time adaptive increase in sample size at a specified look L of the K-look group sequential design. The magnitude of the sample size increase is determined by a pre-specified conditional power, say 1 − β, that the user desires to achieve. In order to speed up the simulations, this sample size is computed by an approximation to equation (54.11) (or (54.12)) that assumes that the next time the data are monitored will be at look K, and all the intermediate looks L + 1, L + 2, . . . K − 1 will be skipped. Specifically the approximate conditional power calculation is given by ( r CPδ (zL ) = 1−Φ bK 1 + nL − zL nK − nL r δ nL − nK − nL p ) p r(1 − r) n∗K − nL σ (54.14) where r is the fraction randomized to the experimental arm. The approximate sample size needed to achieve conditional power 1 − β is then obtained by finding the value of n∗K that satisfies ( r 1 − Φ bK 1 + nL − zL nK − nL r δ nL − nK − nL ) p p r(1 − r) n∗K − nL =1−β . σ (54.15) The operating characteristics of the adaptive design under this approximate way of computing conditional power are almost the same as the operating 54.1 Statistical Method: Normal and Binomial – 54.1.3 Conditional Power 1061 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method characteristics that would be obtained by using say equation (54.11) to evaluate conditional power at each simulation. The simulations are, however, speeded up substantially thereby. 54.1.4 Adaptive Simulations Defaults It will be useful to discuss now, the default settings used for adaptive simulation procedures in East, and the options available to you if you want to change them. If you click on the button Include Options and check Sample Size Re-estimation, you will see the following additional tab called Sample Size Re-estimation on the screen. All the parameters on this tab will explained in detail in the subsequent sections. Here, we explain the default settings on this tab. Three adaptation methods are implemented in this version of East - Cui-Hung-Wang, Chen-DeMets-Lan, and Müller and Schäfer methods. The default method selected is Cui-Hung-Wang. By default, the adaptation happens at a specified look number. One can also perform adaptation after a specified sample size or information fraction. By default, the promising zone is defined on the Conditional Power scale. One can also define it on the Test Statistic scale or Estimated δ scale. The default settings for CP Computations are: Estimated δ/σ for Normal Endpoint 1062 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Estimated (πc , πt ) for Binomial Endpoint Estimated HR for Survival Endpoint We recommend that you do not alter these settings except for research purposes. The choices you make in this dialog box will determine how the adaptive simulations are conducted. Because changes to these settings can substantially alter the operating characteristics of the adaptive simulations, East will revert back to the default values each time the East session is terminated. For Binomial Endpoint designs, there are only two choices: Estimated (πc , πt ) and Design (πc , πt ). Depending on the selection you make, the conditional power computation at the specified interim look (Or sample size Or information fraction) will be based either on the estimated value of πc and πt or the values that have been used for creating the study design. An example of a binomial endpoint design is shown below. The design parameters in the above plan are πc = 0.25 and πt = 0.40. If you have chosen the setting Design (πc , πt ) for conditional power computation in the Sample Size Re-estimation tab, then in any adaptive simulations, the conditional power at the interim analysis will be computed using the design values, πc = 0.25 and 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults 1063 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method πt = 0.40, regardless of the estimated values obtained for these parameters at the time of the interim analysis. For Normal Endpoint designs, the conditional power depends on δ and σ only through the ratio δ/σ. You may choose either Design δ/σ or Estimated δ/σ in the Sample Size Re-estimation tab and the conditional power at the interim analysis in any adaptive simulation will be computed accordingly. The sigma used for the computing test statistic is determined by the choice of sigma made in the Simulation Parameters tab. If this choice is Z then design σ is used otherwise Estimated σ is used in the test statistic computation. Survival Endpoint designs are discussed in Section 54.2. For such designs the treatment effect δ is defined to be the log hazard ratio. You may choose either the Design HR or Estimated HR for purposes of computing conditional power at the interim look. We end this section with an example that shows how the choice of Design or Estimated parameter values for conditional power computation at an interim look can alter the operating characteristics of an adaptive design. Consider the normal endpoint design 1064 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 shown below. The design parameters for the above plan are δ = 15 and σ = 30. Suppose you have chosen Design δ/σ for the conditional power computation on the Sample Size 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults 1065 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Re-estimation tab: and the test statistic as Z on Simulation Parameters tab: Then, in every adaptive simulation you carry out for this design, the conditional power at the interim analyses will be based on the ratio δ/σ = 15/30 = 0.5 and the test statistic will be computed under the assumption that design σ = 30, rather than estimating these quantities from the actual data generated at the interim look. In order to explore the impact of changes to these simulation settings, change the choice of the test statistic to t in Simulation Parameters tab as shown below: 1066 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Change the value of δ1 under Response Generation Info to 10 and keep the σ1 value same as 30. Also make changes on the Sample Size Re-estimation tab as shown below. and set the simulation control parameters as shown below: 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults 1067 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Click on the button Simulate to run the simulations. An entry will be added in the Output Preview pane. Save it in the Library and click the output summary of these simulations. 1068 icon to see the 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Remember that these results were obtained when the values of δ/σ and σ were both set to Estimated values. Now you can change your settings for these two parameters to Estimated and Design respectively. It can be done by editing the current simulation node. Select the simulation node in the Library and click the icon. you will be taken Simulation Parameters tab. Here, select the Test statistic as Z from its dropdown. Run the simulation and get new results. Similarly you can carry out other two simulations with the combinations Design-Estimated and Design-Design for the two parameters δ/σ and σ and obtain the results. So now you will have four sets of results for the four assumptions on the two parameters. These results are all obtained by simulating under the values of δ1 = 10 and σ1 = 30. Carry out two more sets of similar analyses using the combinations of δ1 = 8 & σ1 = 40 and δ1 = 12 & σ1 = 24. You may compare the various resulting values from 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults 1069 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method the simulations, like power, average combined sample size, average adapted sample size, etc. As an example, let us compare the values of power, from the different simulations carried out, as tabulated below. Table 54.1: Results for different assumptions of East Settings for Adaptive Simulation (Design parameters: δ = 15, σ = 30) Settings for Conditional Test Power Statistic δ/σ σ Estimated Estimated Estimated Design Design Estimated Design Design Estimates of Power under different simulation parameters δ1 = 10, σ1 = 30 66.25% 65.93% 67.12% 67.27% δ1 = 8, σ1 = 40 30.8% 48.96% 30.66% 49.21% δ1 = 12, σ1 = 24 93.24% 85.81% 94.42% 87.78% δ1 = 0, σ1 = 30 2.65% 2.5% 2.5% 2.44% We can also compare multiple simulation scenarios in East itself. It can be by selecting from theLibrary the scenarios to be compared and clicking the icon to see the comparison. Let us compare first four scenarios from the above table. 1070 54.1 Statistical Method: Normal and Binomial – 54.1.4 East Defaults <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 By comparing the power estimates across rows or columns in the above table, you will be able to gauge the effect due to different parameters/computations assumptions. Our recommendation is to leave these parameters at their default values except for exploratory purposes. 54.2 Statistical Method: Survival For studies involving survival (time-to-event) endpoints the parameter δ denoting the treatment effect is defined to be the logarithm of the hazard ratio of the treatment arm to the control arm; δ = ln(HR). Under proportional hazards, δ < 0 implies longer survival times for the treatment arm than for the control arm. In order to test the null hypothesis H0 : δ = 0 versus one and two-sided alternatives we exploit the independent increment structure of the sequentially computed logrank statistic (Tsiatis, 1981; Jennison and Turnbull, 1997). Before any data are obtained we pre-specify that the null hypothesis will be tested by a K-look group sequential design at potential stopping times D1 , D2 , . . . DK , where Dj denotes the cumulative number of events obtained at look j and bj is the corresponding level-α stopping boundary derived from some spending function. The CHW method permits data dependent alterations to the cumulative ∗ events at which these looks occur. Accordingly let D1∗ , D2∗ , . . . DK denote the altered cumulative events at the K looks resulting from an adaptation of the original design. Analogous to the notation developed for normal and binomial endpoints let ∗ D(j) = Dj − Dj−1 and D∗(j) = Dj∗ − Dj−1 be the incremental increase in the number of events between looks j − 1 and j. 54.2 Statistical Method: Survival 1071 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method ∗ Let Zj,cum denote the Z-score based by either a logrank statistic or the treatment effect estimate obtained by fitting the Cox proportional hazard model to the cumulative data available at look j. Then the results of Tsiatis(1981) and Jennison and Turnbull (1997) show that the incremental statistics p Z ∗(j) = p ∗ ∗ ∗ Ij∗ Zj,cum − Ij−1 Zj−1,cum p ∗ , for j = 1, . . . , K, ∗ Ij − Ij−1 are asymptotically independent and normally distributed with mean h i q ∗ E Z ∗(j) ≈ δ Ij∗ − Ij−1 (54.16) (54.17) and unit variance, where r is the fraction randomized to the treatment arm. In the ∗ simulation module Zj,cum comes from the log-rank test and we assume that Ij∗ ≈ r(1 − r)Dj∗ (54.18) Here r is the proportion of subjects in the active treatment group. In the interim monitoring module we use an approximation (54.18) as default and use a slightly different approximation Ij∗ = 1 2 ˆ δ̂j ) se( (54.19) ˆ δ̂j ) provided by if the monitoring relies on the estimates of treatment effect δ̂j and se( fitting the Cox proportional hazard model to the cumulative data at look j. Let w(j) = D(j) , for j = 1, 2, . . . K, D(K) be pre-specified weights. Following Wassmer (2006), the CHW statistic for survival designs is constructed by combining the independent incremental statistics (54.16) with these weights: √ √ √ w(1) Z ∗(1) + w(2) Z ∗(2) + . . . + w(j) Z ∗(j) √ Zj,∗ chw = . (54.20) w(1) + w(2) + . . . + w(j) 1072 54.2 Statistical Method: Survival <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The CHW statistic (54.20) for survival endpoints has the same asymptotic distribution as the CHW statistic (54.2) for normal and binomial endpoints. Thus all the distributional results, repeated confidence intervals, p-values, and conditional power calculations derived in Section 54.1.1, Section 54.1.2, and Section 54.1.3 for normal and binomial endpoints also hold for time-to-event endpoints with δ = ln(HR), σ = 1, D∗(j) substituting for n∗(j) . ∗ In particular equation (54.14), depicting the conditional power if ZL, chw = zL at look ∗ L and DK cumulative events are required at the Kth look, can be re-expressed in the form ( r CPδ (zL ) = 1 − Φ bK DL 1+ − zL DK − DL r ) p p DL ∗ − δ r(1 − r) DK − DL . DK − DL (54.21) Since the true value of δ is unknown it is customary to substitute either its look L estimate δ̂L , ∗ ZL,cum ∗ δ̂L =p (54.22) ∗ r(1 − r)DL or else the value δ1 specified under the alternative hypothesis at the design stage, in the above expression for conditional power. East provides the user with both options. Note that like equation (54.14), equation (54.21) also involves the simplifying assumption that the next look following look L will be the last look, and all intermediate looks L + 1, L + 2, . . . K − 1 will be skipped. This assumption yields an approximate conditional power that can be computed rapidly and is sufficiently accurate for use in simulation experiments such as those discussed in Section 54.5.3. However, special calculators documented in Chapter 57 are available if a more accurate conditional power computation that respects the actual stopping boundaries at looks L + 1, L + 2, . . . K − 1 is desired. 54.3 Normal Endpoint: Schizophrenia Trial 54.3.1 Fixed Sample Design 54.3.2 Adaptive Design 54.3.3 Interim Monitoring Consider a two-arm trial to determine if there is an efficacy gain for an experimental drug relative to the industry standard treatment for negative symptoms schizophrenia. The primary endpoint is the improvement from baseline to week 26 in the Negative Symptoms Assessment (NSA), a 16-item clinician-rated instrument for measuring the negative symptomatology of schizophrenia. Let µt denote the difference between the mean NSA at baseline and the mean NSA at week 26 for the treatment arm and let µc denote the corresponding difference of means for the control arm. Denote the efficacy gain by δ = µt − µc . The trial will be designed to test the null hypothesis H0 :δ = 0 versus the one-sided alternative hypothesis that δ > 0. It is expected, from limited data on related studies, that δ ≥ 2 and σ, the between-subject standard deviation, is believed to be about 7.5. In the discussion that follows, we shall focus our attention on 54.3 Normal Endpoint 1073 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method the uncertainty about δ. Even though the statistical methods discussed here are applicable when there is uncertainty about either δ or σ, the adaptive approach requires careful justification primarily when δ is involved. Adaptive sample size adjustments relating to uncertainty about σ are fairly routine and non-controversial. One way to eliminate the uncertainty due to σ is to re-parameterize the treatment effect in terms of δ/σ, since it turns out that the sample size, power and conditional power are all dependent on δ and σ only through this ratio. Although we shall not follow that approach here, we wish to point out that it is supported by the EastAdapt software. This example is discussed in detail in Chapter 53, Section 53.2, where the relative merits of the fixed sample, group sequential and adaptive designs are compared. We have re-introduced this example in the present chapter in order to illustrate how to use the adaptive features in East software to design, simulate and monitor an adaptive clinical trial that will test the null hypothesis δ = 0 and estimate the parameter δ. 54.3.1 Fixed Sample Design Since it is believed a priori that δ ≥ 2, we first create Des 1, a single-look design with 80% power to detect δ = 2 using a one-sided level 0.025 test, given σ = 7.5. Des 1 shows that if the assumptions about δ and σ are correct, the trial will achieve 80% power with a total sample size of 442 subjects. There is, however, considerable 1074 54.3 Normal Endpoint – 54.3.1 Fixed Sample Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 uncertainty about the true value of δ, and to a lesser extent about σ. Nevertheless it is believed that even if the true value of δ were as low as 1.6 on the NSA scale, that would constitute a clinically meaningful effect. Des 2, displayed below, shows that if 690 subjects are enrolled the power to detect δ = 1.6 is 80%. So far we have proposed two design options. Under Des 1 we would enroll 442 subjects and hope that the study is adequately powered, which it will be if δ = 2 and σ = 7.5. If, however δ = 1.6 the power drops from 80% to 61%. 54.3 Normal Endpoint – 54.3.1 Fixed Sample Design 1075 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method There is thus a risk of launching an underpowered study for an effective drug under Des 1, even if σ = 7.5. Under Des 2 we will enroll 690 subjects, thereby ensuring 80% power at the smallest clinically meaningful value, δ = 1.6, and rising to 94% power at δ = 2. The operating characteristics of Des 1 and Des 2 are displayed side by side in Table 54.2. If resources were plentiful, Des 2 would clearly be the preferred option. The sponsor must, however, allocate scarce resources over a number of studies and in any case is not in favor of designing an overpowered trial. This leads naturally to considering a design that might be more flexible with respect to sample size than either of the above 1076 54.3 Normal Endpoint – 54.3.1 Fixed Sample Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 54.2: Operating Characteristics of Des 1 and Des 2 δ 1.6 1.7 1.8 1.9 2.0 Des 1 Sample Size Power 442 442 442 442 442 Des 2 Sample Size Power 61% 66% 71% 76% 80% 690 690 690 690 690 80% 85% 88% 91% 94% two single-look fixed sample designs. Two options for providing this greater flexibility are the group sequential design and the adaptive design. In the group sequential design one starts out with a large up-front commitment by powering the study to detect the smallest clinically meaningful treatment effect δ = 1.6, but the expected sample size is reduced by means of early stopping boundaries. In the adaptive design, one starts out with a smaller initial sample size by powering the study to detect the optimistic treatment effect δ = 2, but reserves the option to increase the sample size on the basis of the data obtained at an interim look, should it appear advantageous to do so. Group sequential designs are discussed extensively elsewhere in the East manual and hence this option need not be illustrated in the current chapter. We refer the user to Chapter 53, Section 53.2 of this user manual for a thorough discussion of the relative merits of the group sequential and adaptive options as they relate to the present example. It is seen that the relatively long follow-up (26 weeks ) before the primary endpoint is observed leads to patient overruns which offset some of the advantages of the group sequential design. We shall accordingly confine our discussion to adaptive design for the remainder of this section. 54.3.2 Adaptive Design To motivate the adaptive design let us recall that although the actual value of δ is unknown, the investigators believe that δ ≥ 2. For this reason Des 1 was constructed to have 80% power to detect δ = 2. Des 2 on the other hand was constructed to have 80% power to detect δ = 1.6, the smallest clinically meaningful treatment effect. If there were no resource constraints one would of course prefer to design the study for 80% power at δ = 1.6 since that would imply even more power at δ = 2. However, as can be seen from Table 54.2, this conservative strategy carries as its price a substantially larger up-front sample size commitment which is, moreover, unnecessary if in truth δ = 2. 54.3 Normal Endpoint – 54.3.2 Adaptive Design 1077 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method The above difficulties lead us to consider whether Des 1, which was intended to detect δ = 2 with 80% power and hence does not have such a large up-front sample size commitment, might be improved so as to provide some insurance against substantial power loss in the event that δ = 1.6. The adaptive approach is suited to this purpose. In this approach we start out with a sample size of 442 subjects as in Des 1, but take an interim look after data are available on 208 completers. The purpose of the interim look is not to stop the trial early but rather to examine the interim data and continue enrolling past the planned 442 subjects if the interim results are promising enough to warrant the additional investment of sample size. This strategy has the advantage that the sample size is finalized only after a thorough examination of data from the actual study rather than through making a large up-front sample size commitment before any data are available. Furthermore, if the sample size may only be increased but never decreased from the originally planned 442 subjects, there is no loss of efficiency due to overruns. For the final analysis we adopt the CHW statistic described in Section 54.1, so as to avoid inflating the type-1 error. Selecting the Criteria for an Adaptive Sample Size Increase The operating characteristics of an adaptive design depend in a complicated way on the criteria for increasing the sample size after observing the interim data. These criteria may combine objective information such as the current estimate of δ or the current conditional power with assessments of safety and with information available from other clinical trials that was not available at the start of the study. The adaptive approach provides complete flexibility to modify the sample size without having to pre-specify a precise mathematical formula for computing the new sample size based on the interim data. Therefore the full benefit of the flexibility offered by an adaptive design cannot be quantified ahead of time. Nevertheless it is instructive to investigate power and expected sample size by simulating the trial under different values of δ and applying precise pre-specified rules for increasing the sample size on the basis of the observed interim results. This will provide at least some idea, at the design stage, of the trade-off between the fixed sample or group sequential approaches and the adaptive approach. To this end we create Des 3 as a 2-look design with 80% power to detect δ = 2 with a one-sided level-0.025 test, and one interim analysis utilizing the γ(−24) spending function after data are available on 208 completers. The γ(−24) early stopping boundary selected for Des 3 is so conservative that for all practical purposes there is no early stopping at all. The specification of this early stopping boundary is simply an artificial device for permitting an interim look at which one may adaptively increase 1078 54.3 Normal Endpoint – 54.3.2 Adaptive Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the sample size. Therefore Des 3 may be viewed as an extension of Des 1. At the start of the trial, both plans have the same sample size of 442 subjects and 80% power at δ = 2, deteriorating to 61% power at δ = 1.6. Des 3 stipulates, however, that an interim look will be taken after 26 weeks of follow-up data are available on 208 of the planned 442 subjects. At that interim look the sample size may be increased. The timing of the interim look reflects a preference for performing the interim analysis as late as possible but nevertheless while the trial is still enrolling subjects since, once the enrollment sites have closed down, it will be difficult to start them up again. Under the assumption that subjects enroll at the rate of 8 per week we will have enrolled 416 subjects by week 52; 208 of them will have 54.3 Normal Endpoint – 54.3.2 Adaptive Design 1079 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method completed the required 26 weeks of follow-up for the primary endpoint, and an additional 208 subjects will comprise the overruns. Only the data from the 208 completers will be used in making the decision to increase the sample size. After this decision is taken, enrollment will continue until the desired sample size is attained. The primary efficacy analysis will be based on the full 26 weeks of follow-up data from all enrolled subjects and will utilize the CHW test, thereby ensuring that the type-1 error is preserved despite the data dependent sample size change at the interim look. It should be noted that, unlike the group sequential setting where the 208 overruns at the time of the interim look played no role in the early stopping decision, here the data from the 208 overruns will be fully utilized in the primary efficacy analysis which will only occur when all enrolled subjects have completed 26 weeks of follow-up. This is one of the advantages of the adaptive approach relative to the group sequential approach for trials with lengthy follow-up. The East software provides a simulation tool for studying the consequences of increasing the sample size of Des 3 at the interim look. To implement this tool we must add the sample size re-estimation tab for Des 3. Select Des 3 in the Library and click the icon. In addition to the default tabs appearing by default on inserting Simulations, one can add more tabs to enter information available on randomization, stratification and sample size re-estimation. This can be done by clicking the Include Options button on this right hand top corner of the screen. Select Sample Size Re-estimation from the list. This will add a tab named as Sample Size Re-estimation as shown below: Let us focus on these tabs. Several parameters on these four tabs shown below play important role in simulation and adaptation of a design. The three tabs Simulation Parameters, Response Generation Info and Simulation Control Info contain all the 1080 54.3 Normal Endpoint – 54.3.2 Adaptive Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 information about the design Des 3 in the absence of any adaptive change. It is a two-look design with a sample size of 442 and an interim look after 208 completers. The early stopping boundary generated by the γ(−24) spending function equals 5.251 standard deviations on the Wald statistic scale. With this extremely conservative boundary there is practically no chance of early stopping even at the alternative hypothesis that δ = 2. This design is for all practical purposes the same as Des 1. The fourth tab Sample Size Re-estimation is used to specify the rules of adaptation for modifying the initial sample size of Des 3, based on the data at the interim analysis. Before running the simulations we must input suitable values into the cells of this tab. We have made the following choices in different tabs: The Response Generation Info tab: The Sample Size Re-estimation tab: Most of these simulation parameters are self-explanatory. Some of them need further explanation. This is provided below. Adapt at: For a K-look group sequential design, one can decide the time at which 54.3 Normal Endpoint – 54.3.2 Adaptive Design 1081 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method conditions for adaptations are to be checked and actual adaptation is to be carried out. This can be done either at some intermediate look or after some specified information fraction. The possible values of this parameter depends upon the choice of the user. If it is Look no. then this parameter can be any integer number from 1 to K − 1. If the adaptation is to be carried out after reaching specified information fraction then this parameter can be a fraction between 0 and 1. The default choice in East is Look no. to decide the time of adaptation. Target CP for Re-estimating Sample Size: The primary driver for increasing the sample size at the interim look is the desired (or target) conditional power or probability of obtaining a positive outcome at the end of the trial, given the data already observed. For this example we have set the conditional power at the end of the trial to be 80%. East then computes the sample size that would be required to achieve this desired conditional power. The computation assumes that the estimated δ̂ obtained at the interim look is the true δ. Refer to Section 54.1.3 for the relevant formula for this computation. Maximum Sample Size if Adapt (multiplier; total): As just stated, a new sample size is computed at the interim analysis on the basis of the observed data so as to achieve some target conditional power. However the sample size so obtained will be overruled unless it falls between pre-specified minimum and maximum values. For this example, the range of allowable sample sizes is [442, 884]. If the newly computed sample size falls outside this range, it will be reset to the appropriate boundary of the range. For example, if the sample size needed to achieve the desired 80% conditional power is less than 442, the new sample size will be reset to 442. In other words we will not decrease the sample size from what was specified initially. On the other hand, the upper bound of 884 subjects demonstrates that the sponsor is prepared to increase the sample size up to double the initial investment in order to achieve the desired 80% conditional power. But if 80% conditional power requires more than 884 subjects, the sample size will be reset to 884, the maximum allowed. Promising Zone Scale: One can define the promising zone as an interval based on conditional power or test statistic or δ/σ. The input fields change according to this choice. The decision of altering the sample size is taken based on whether the interim value of conditional power / test statistic / δ/σ lies in this interval or not. Let us keep the default scale which is Conditional Power. Promising Zone: Minimum/Maximum Conditional Power (CP): The sample size 1082 54.3 Normal Endpoint – 54.3.2 Adaptive Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 will only be altered if the estimate of CP at the interim analysis lies in a pre-specified range, referred to as the ”Promising Zone”. Here the promising zone is stipulated to be 0.30 − 0.80. The idea is to invest in the trial in stages. Prior to the interim analysis the sponsor is only committed to a sample size of 442 subjects. If, however, the results at the interim analysis appear reasonably promising, the sponsor would be willing to make a larger investment in the trial and thereby improve the chances of success. Here we have somewhat arbitrarily set the lower bound for a promising interim outcome to be CP = 0.30. An estimate CP < 0.30 at the interim analysis is not considered promising enough to warrant a sample size increase. It might sometimes be desirable to also specify an upper bound beyond which no sample size change will be made. Here we have set that upper bound of the promising zone at CP = 0.80. In effect we have partitioned the range of possible values for conditional power at the interim analysis into three zones; unfavorable (CP ≤ 0.3), promising (0.3 ≤ CP < 0.8), and favorable (CP ≥ 0.8). Sample size adaptations are attempted only if CP (with no sample size adaptation) falls in the promising zone at the interim analysis. The promising zone defined on the Test Statistic scale or δ/σ scale work on the similar lines. The Simulation Control Info tab: Operating Characteristics of Adaptive Implementation of Des 3 Having entered the above simulation parameters into the simulation tabs, we simulate the adaptive implementation of Des 3 100,000 times. An entry gets added in the Output Preview pane. Save this Simulation node in the workbook and either double click on the node or click the icon to see the details for the complete simulation 54.3 Normal Endpoint – 54.3.2 Adaptive Design 1083 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method output. The results from the 100,000 simulated trials are displayed in three tables titled Simulation Boundaries and Boundary Crossing Probabilities, Average Sample Size and Look Times and Simulation Results by Zone. We observe from the table that the power of the adaptive implementation of Des 3 at δ = 1.6 is 67.05%, an improvement of about 6% over the power of Des 1 at the same value of δ. This increase in power has come at an average cost of 510 − 442 = 68 additional subjects. Next we observe from the Simulation Results by Zone that 26,488of the 100,000 trials (26.49%) underwent a sample size adaptation and of these 26,488 trials, 21,986 (83%) were able to reject the null hypothesis. The average sample size, conditional on adaptation was 697.41. To examine these same results in more details, we see the table Zone-wise Averages. This table contains the results from all the six zones - Futility, Unfavorable, Promising, Favorable, Efficacy and All Trials. The simulations fall in the unfavorable zone, promising, favorable and efficacy zones 1084 54.3 Normal Endpoint – 54.3.2 Adaptive Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 32.52%, 26.49%, 40.97% and 0.022% of the time respectively. Observe that while the overall probability of obtaining a significant result is only 67.05%, this probability jumps up to 83.0% conditional on falling in the promising zone. We repeat these simulations with other values of δ between 1.6 and 2. The operating characteristics for the adaptive Des 3, are compared to those of the fixed sample Des 1 in Table 54.3. All results for Des 3 are based on 100,000 simulated trials and rounded to the nearest percentage point. Table 54.3: Operating Characteristics of Des 1 (Fixed Sample) and Des 3 (Adaptive) Value of δ 1.6 1.7 1.8 1.9 2.0 Des 1(Fixed Sample) Power Expected SampleSize 61% 442 66% 442 71% 442 76% 442 80% 442 Des 3- Sim 1 to Sim 5(Adaptive) Power Expected Sample Size 67% 509 72% 508 77% 506 81% 502 84% 499 The power of the adaptive Des 3 has increased by about 6% at δ = 1.6 and by about 4% at δ = 2 compared to Des 1. These power gains were obtained at the cost of corresponding average sample size increases of 67 subjects at δ = 1.6 and 57 subjects at δ = 2. Although these power gains appear fairly modest, Des 3 offers a significant benefit in terms of risk reduction, not reflected in Table 54.3. To see this, it is important to note that the sample size under Des 3 is only increased when the interim results are promising; i.e., when the conditional power at the interim analysis is greater than or equal to 30% but less than 80%. This is the very situation in which it is advantageous to increase the sample size and thereby avoid an underpowered trial. When the interim results are unfavorable (conditional power < 30%) or favorable (conditional power ≥ 80%), a sample size increase is not warranted and hence it is unchanged at 442 subjects for both Des 1 and Des 3. But when the interim results are promising (conditional power between 30% and 80%) the sample size is increased under Des 3 in an attempt to boost the conditional power back to 80%. It is this feature of the adaptive design that makes it more attractive than the simpler fixed sample design. In order to compare Des 1(the fixed sample design) with Des 3 (the group sequential design designed with adaptive simulations) conditional on zone, let us edit the simulations inputs associated with Des 3. Select Simulation node in the Library and click the icon and make the changes as below: 54.3 Normal Endpoint – 54.3.2 Adaptive Design 1085 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method The Response Generation Info tab: and the Sample Size Re-estimation tab: Here we are simulating the same design except that we will not make any modification to the sample size at the interim look. Des 3 stipulates, however, that an interim look will be taken after 26 weeks of follow-up data are available on 208 of the planned 442 subjects. At that interim look the sample size may be increased. But in this modified setup, we will not increase the sample size at the interim look. Note that we have kept the cap on max. sample size after adaptation as 442 under modified setup, compared to 884 under Sim 1. Now we can run the adaptive simulation under this modified setup and make a comparison of the results with the results obtained under Sim 1. Table 54.4 displays the probability of falling into the unfavorable, promising and favorable zones at the interim look, along with the power and expected sample size, conditional on falling into each zone, under various values of δ. The table highlights the key advantage of the adaptive design (Sim 1 to Sim 5) compared to the traditional group sequential (Sim 6 to Sim 10) i.e., the ability to invest 1086 54.3 Normal Endpoint – 54.3.2 Adaptive Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 54.4: Operating Characteristics of Traditional Group Sequential Trial and an Adaptive Group Sequential Trial Conditional on Interim Outcome δ 1.6 1.7 1.8 1.9 2.0 Interim Outcome Probability of Interim Outcome Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff 32% 26% 41% 29% 26% 45% 26% 26% 48% 23% 25% 52% 20% 24% 56% Power Conditional on Interim Outcome Des 1 Des 3 28% 62% 87% 32% 65% 89% 36% 69% 91% 40% 73% 93% 45% 76% 95% 28% 83% 87% 32% 86% 89% 37% 88% 91% 41% 91% 93% 45% 93% 94% Expected Sample Size Des 1 Des 3 442 442 442 442 442 442 442 442 442 442 442 442 442 442 442 442 697 442 442 694 442 442 692 442 442 687 442 442 684 442 All results are based on 100,000 simulated trials in the trial in stages, with the second stage of the investment being required only if promising results are obtained at the first stage. This feature of adaptive design makes it far more attractive as an investment strategy than fixed sample or non-adaptive group sequential design which has no provision for increasing the sample size if a promising interim outcome is obtained. Suppose, for example that δ = 1.6, the smallest clinically meaningful treatment effect. The trial sponsor only commits the resources needed for 442 subjects at the start of the trial, at which point the chance of success is 61%, as shown in Table 54.3. The additional sample size commitment is forthcoming only if promising results are obtained at the interim analysis, and in that case the sponsor’s risk is substantially reduced because the chance of success jumps to 83%, as shown in Table 54.4. Similar results are observed for the other values of δ. The probabilities of entering the unfavorable, promising and favorable zones at the interim analysis, displayed in Table 54.4, are instructive. Consider again the case 54.3 Normal Endpoint – 54.3.2 Adaptive Design 1087 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method δ = 1.6. At this value of δ there is a 26% chance of landing in the promising zone and thereby obtaining a substantial power boost under adaptive setup as compared to non-adaptive. That is, 26% of the time the adaptive strategy can rescue a trial that is underpowered at the interim look. The chance of entering the favorable + efficacy zone is 41%. That is, 41% of the time the sponsor will be lucky and have a well powered trial at the interim look without the need to increase the sample size. The remaining 32% of the time the sponsor will be unlucky and will enter the unfavorable zone from which also there is no sample size increase, and the chance of success is only 28%. These odds improve with larger values of δ. The adaptive implementation satisfies the objective of powering the study primarily for δ = 2 while providing a hedge against substantial power loss if 1.6 ≤ δ < 2. It is thus a good compromise between Des 1 which is powered to detect δ = 2 without any means of improving power if δ = 1.6, and Des 2 which is powered to detect δ = 1.6 but utilizes excessive sample size resources if δ = 2. 54.3.3 Interim Monitoring Now we will discuss the interim monitoring procedure taking the example of Des 3. Accordingly we invoke the CHW IM dashboard associated with Des 3 by clicking on the icon from the toolbar. The following dashboard appears. This dashboard differs from the usual interim monitoring dashboard for a classical group sequential trial in the following major ways: The Pre-specified Nominal Critical Points (stopping boundaries) are written into the dashboard as soon as it is invoked, and are non-editable. Patient accruals and corresponding test statistics are entered incrementally for each look, rather than cumulatively for all looks taken thus far. The weighted statistic is obtained by combining these incremental test statistics using Pre-specified Weights that are written into the dashboard as soon as it is invoked. One 1088 54.3 Normal Endpoint – 54.3.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is free to change the incremental sample size at each look from what was originally specified at the design stage. But if the sample sizes that correspond to the original study design are entered, then the weighted statistic is the same as the usual Wald statistic used for conventional (non-adaptive) interim monitoring Suppose the first look is taken as planned after enrolling 208 subjects. Suppose we observe δ̂ = 1.7 and σ̂ = 7.6 thus leading to a standard error of p (4 ∗ 7.62 /208) = 1.0539. The incremental statistic at the first look is thus (1.7/1.0539) = 1.613. Invoke the Test Statistic Calculator by clicking on the button. We enter these quantities into the Test Statistic Calculator as shown below. Since the nominal critical value for early stopping is 5.251, the trial continues. We now need to decide on the sample size to use for the second and final look. We invoke the 54.3 Normal Endpoint – 54.3.3 Interim Monitoring 1089 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method conditional power calculator to assist with this decision. Suppose we specify to the calculator that we wish to obtain 80% conditional power to detect delta=1.6 with a hypothesized value of 7.5 for sigma. Upon entering these terms 1090 54.3 Normal Endpoint – 54.3.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 into the calculator we obtain a final (overall) sample size of 564.7 subjects. Based on the guidance provided by the calculator, suppose we decide to enroll a total of 565 subjects. This implies that the incremental number to be entered into the interim monitoring dashboard is 565-208=357 subjects. Suppose that, based only on these 357 incremental subjects, the estimate of delta is 1.5 and the estimate of sigma is 7.7. The standard error of δ̂ is thus 54.3 Normal Endpoint – 54.3.3 Interim Monitoring 1091 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method p (4 ∗ 7.72 /357) = 0.8151, leading to an incremental test statistic of 1.8404. Upon pressing the OK button the incremental test statistic is entered into the interim monitoring dashboard and the weighted statistic that combines the two incremental statistics by the square roots of the pre-specified weights (as described in Section 54.1, equation 54.4) is computed as 2.446. Since the weighted statistic exceeds the nominal critical value, the null hypothesis is rejected. The confidence interval for delta is (0.3146, infty) and the p-value is 0.0072. These estimates are based on the methods described in Section 54.1 and are appropriately adjusted to preserve their validity in 1092 54.3 Normal Endpoint – 54.3.3 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the face of adaptive sample size changes. 54.4 Binomial Endpoint: Acute Coronary Syndromes 54.4.1 Fixed Sample Design 54.4.2 Group Sequential Design 54.4.3 Adaptive Group Sequential Design 54.4.4 Operating Characteristics 54.4.5 Adding a Futility Boundary Consider a two-arm, placebo controlled randomized clinical trial for subjects with acute cardiovascular disease undergoing percutaneous coronary intervention (PCI). The primary endpoint is a composite of death, myocardial infarction or ischemia-driven revascularization during the first 48 hours after randomization. We assume on the basis of prior knowledge that the event rate for the placebo arm is 8.7%. The investigational drug is expected to reduce the event rate by at least 20%. The investigators are planning to randomize a total of 8000 subjects in equal proportions to the two arms of the study. 54.4.1 Fixed Sample Design We show with the help of East that a conventional fixed sample design enrolling a total of 8000 subjects will have 83% power to detect a 20% risk reduction with a one-sided level-0.025 test of significance (with 0.087 on the control arm and 54.4 Binomial Endpoint – 54.4.1 Fixed Sample Design 1093 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method 0.8 × 0.087 = 0696 on the treatment arm). The actual risk reduction is expected to be larger, but could also be as low as 15%, a treatment effect that would still be of clinical interest given the severity and importance of the outcomes. In addition, there is some uncertainty about the magnitude of the placebo event rate. For these reasons the investigators wish to build into the trial design some flexibility for adjusting the sample size. Two options under consideration are, a group sequential design with the possibility of early stopping in case the risk reduction is large, and an adaptive design with the possibility of increasing the sample size in case the risk reduction is small. In the remainder of this section we shall discuss these two options and show how they may be combined into a single design that captures the benefits of both. 54.4.2 Group Sequential Design We first transform the fixed sample design into an 8000 person group sequential design with two interim looks, one after 4000 subjects are enrolled (50% of total information) and the second after 5600 subjects are enrolled (70% of total information). Early stopping efficacy boundaries are derived from the Lan and DeMets (1983) O’Brien-Fleming type error spending function. This group sequential design is shown as Des 2 in the following screen shot. Along with this plan, its operating characteristics 1094 54.4 Binomial Endpoint – 54.4.2 Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 are also shown by the side. The output tells us that for this design, where the risk reduction is 20%; the probabilities of crossing boundary at Look1 (N=4000) is 0.181, at Look2 (N=5600) 0.31, and at Final Look 0.33; the overall power is 82%. We can also create different designs by changing the value of risk reduction in Des 2 and obtain their corresponding results. A summary of such results is displayed in Table 54.5. The first column of Table 54.5 is a list of potential risk reductions, defined as 100 × (1 − ρ)% where ρ = πt /πc , πt is the event rate for the treatment arm, and πc is the event rate for the control arm. The remaining columns display early stopping probabilities, power and expected sample size. Since the endpoint is observed with 48 hours, the problem of overruns that we encountered in the schizophrenia trial is negligible and may be ignored. 54.4 Binomial Endpoint – 54.4.2 Group Sequential Design 1095 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Table 54.5: Operating Characteristics of Des 2, a Three-Look 8000-Person Group Sequential Design Risk Reduction 100 × (1 − ρ) 15% 17% 20% 23% 25% Probability of Crossing Efficacy Boundary At Look 1 At Look 2 At Final Look (N = 4000) (N = 5600) (N = 8000) 0.074 0.109 0.181 0.279 0.357 0.183 0.235 0.310 0.362 0.376 0.309 0.335 0.330 0.275 0.222 Overall Power Expected Sample Size 57% 68% 82% 92% 95% 7264 7002 6535 6017 5671 Table 54.5 shows that Des 2 is well powered, with large savings of expected sample size for risk reductions of 20% or more. It is thus a satisfactory design if, as is initially believed, the magnitude of the risk reduction is in the range 20% to 25%. This design does not, however, offer as good protection against a false negative conclusion for smaller risk reductions. In particular, even though 15% is still a clinically meaningful risk reduction, Des 2 offers only 57% power to detect this treatment effect. One possibility then is to increase the up-front sample size commitment of the group sequential design so that it has 80% power if the risk reduction is 15%. This leads to Des 3, a three-look group sequential design with a maximum sample size commitment of 13,853 subjects, one interim look after 6927 subjects (50% of total information) and a second interim look after 9697 subjects (70% of total information). Des 3 has 80% power to detect a risk reduction of 15% with a one-sided level-0.025 test. 1096 54.4 Binomial Endpoint – 54.4.2 Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 54.6 displays operating characteristics of Des 3 for risk reductions between 15%, and 25%, while keeping the maximum sample size as 13,853 . Notice that by attempting to provide adequate power at 15% risk reduction, the low end of clinically meaningful treatment effects, we have significantly over-powered the trial for values of risk reduction in the expected range of risk reductions, 20% to 25% . If, as expected, the risk reduction exceeds 20%, the large up-front sample size commitment of 13,853 subjects under Des 3 is unnecessary. Des 2 with an up-front commitment of only 8000 subjects will provide sufficient power in this setting. From this point of view, Des 3 is Table 54.6: Operating Characteristics of Des 3, a Three-Look 13,853-Person Grp Sequential Design Risk Reduction 100 × (1 − ρ) 15% 17% 20% 23% 25% Probability of Crossing Efficacy Boundary At Look 1 At Look 2 At Final Look (N = 6926) (N = 9697) (N = 13, 853) 0.167 0.246 0.395 0.565 0.675 0. 298 0.349 0.375 0.329 0.269 0.335 0.296 0.196 0.099 0.054 Overall Power Expected Sample Size 80% 89% 97% 99.3% 99.8% 11,456 10,699 9558 8574 8061 not a very satisfactory design. It commits the investigators to a very large and expensive trial in order to provide adequate power in the pessimistic range of risk reductions, without any evidence that the true risk reduction does indeed lie in the pessimistic range. Evidently a single group sequential design cannot provide adequate power for the ”worst-case” scenario, and at the same time avoid overpowering the more optimistic range of scenarios. This leads us to consider building an adaptive sample size re-estimation option into the group sequential design Des 2, such that the adaptive component will provide the necessary insurance for the worst-case scenario, and thereby free the group sequential component to provide adequate power for the expected scenario, without a large and unnecessary up-front sample size commitment. 54.4.3 Adaptive Group Sequential Design We convert the three-look group sequential design Des 2 into an adaptive group sequential design by inserting into it the option to increase the sample size at look 2, when 5600 subjects have been enrolled. Recreate the Des 2 by clicking on the icon and just clicking Compute button. This will create Des 4 in the Output Preview pane. Save it in the workbook. The sample size re-estimation or adaptation can be done through simulations. The rules governing the sample size increase for Des 4 are 54.4 Binomial Endpoint – 54.4.3 Adaptive Group Sequential Design 1097 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method similar to the rules specified in Section 53.2.4 for the schizophrenia trial, but tailored to the needs of the current trial. The idea is to identify unfavorable, promising and favorable zones for the interim results at look 2, based on the attained conditional power. Sample size should only be increased if the interim results fall in the promising zone. Subject to an upper limit, the sample size should be increased by just the right amount to boost the current conditional power to some desired level (say 80%). The following are the design specifications for Des 4: 1. The starting design is Des 2 with a sample size of 8000 subjects, one interim look after enrolling 4000 subjects and a second interim look after enrolling 5600 subjects. The efficacy stopping boundaries at these two interim looks are derived from the Lan and DeMets (1983) error spending function of the O’Brien-Fleming type. 2. At the second interim analysis, with data available on 5600 subjects, the conditional power is computed using the estimated value ρ̂ as though it were the true relative risk ρ. If the conditional power is no greater than 30%, the outcome is deemed to be unfavorable. If the conditional power is between 30% and 80%, the outcome is deemed to be promising. If the conditional power is at least 80%, the outcome is deemed to be favorable 3. If the interim outcome is promising, the sample size is re-computed so as to achieve 80% conditional power at the estimated value ρ̂. The original sample size is then updated to the re-computed sample size, subject to the constraint in item 4 shown below 4. If the re-computed sample size is less than 8000, the original sample size of 8000 subjects is used. If the re-computed sample size exceeds 16,000, the sample size is curtailed at 16,000 subjects Some features of this adaptive strategy are worth pointing out. First, the sample size is re-computed on the basis of data from 5600 subjects from the trial itself. Therefore the estimate of ρ available at the interim analysis is substantially more reliable than the estimate that was used at the start of the trial to compute an initial sample size of 8000 subjects. The latter estimate is typically derived from smaller pilot studies or from other phase 3 studies in which the patient population might not be exactly the same as that of the current trial. Second, a sample size increase is only requested if the interim results are promising, in which case the trial sponsor should be willing to invest the additional resources needed to power the trial adequately. In contrast Des 3 increases the sample size substantially at the very beginning of the trial, before any data are available to determine if the large sample size is justified. 1098 54.4 Binomial Endpoint – 54.4.3 Adaptive Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 54.4.4 Operating Characteristics of Adaptive Group Sequential Design The East software provides a simulation tool for studying the consequences of increasing the sample size of Des 4 at the interim look. To implement this tool we must add the sample size re-estimation tab for Des 4. Select Des 4 in the Library and click icon. Click the Include Options button and select Sample Size the Re-Estimation from the list. This will add a tab named as Sample Size Re-estimation as shown below: The first two tabs Simulation Parameters and Response Generation Info contains all the information about the design Des 4 in the absence of any adaptive change. It is a three-look design with a sample size of 8000 and first interim look after 4000 subjects, second interim look after 5600 subjects and the last look after 8000 subjects. The early stopping boundaries generated by the LD(OF ) spending function equals -2.963 and -2.462 at the first look and the second look respectively. The third tab Sample Size Re-estimation is used to specify the rules for modifying the initial sample size of Des 4, based on the data at the interim analysis. The description of these parameters is similar to what is described for the normal endpoint example in section 54.3.2. We will run simulations for different risk reduction values (15% to 25%) by changing the proportion response (treatment) values correspondingly from 0.85 × 0.087 to 0.75 × 0.087. Before running the simulations we must input suitable values into the cells of this tab. enter the following values of proportion under treatment as 0.07395, 0.07221, 0.0696, 0.06699, 0.06525. The Response Generation Info Tab: 54.4 Binomial Endpoint – 54.4.4 Operating Characteristics 1099 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method The Sample Size Re-estimation Tab: And the Simulation Control Info Tab: The power and expected sample size of these adaptive group sequential designs of Des 4 are summarized in Table 54.7. For comparative purposes, corresponding power and sample size values of Des 2 are also provided in this Table. If there is a 15% risk reduction, Des 4 has 6% more power than Des 2 but utilizes an additional 1002 subjects on average. It is seen that as the risk reduction parameter increases the power advantage and additional sample size requirement of Des 4 are reduced relative to Des 2. The power and sample size entries in Table 54.7 were computed unconditionally, and for that reason do not reveal the real benefit that design Des 4 offers compared to 1100 54.4 Binomial Endpoint – 54.4.4 Operating Characteristics <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 54.7: Operating Characteristics of Des 2 (Group Sequential) and Des 4 (Adaptive Group Sequential) Designs Risk Reduction 100 × (1 − ρ) 15% 17% 20% 23% 25% Des 2 (Group Sequential) Power Expected Sample Size Des 4 (Adaptive Group Sequential) Power Expected Sample Size 57% 7264 63% 8265 68% 7002 73% 7919 82% 6535 86% 7289 92% 6017 94% 6543 95% 5671 97% 6027 All results for Des 4 are based on 100,000 simulated trials design Des 2. As discussed previously in the schizophrenia example, the real benefit of an adaptive design is the opportunity it provides to invest in the trial in stages with the second stage investment forthcoming only if promising results are obtained at the first stage. To explain this better it is necessary to display power and expected sample size results conditional on the zone (unfavorable, promising or favorable) into which the results of the trial fall at the second interim analysis. To this end we run through the entire set of 100,000 simulations for Des 4 twice. In the first run we do not allow the sample size to change even when the conditional power lies in the promising zone. In effect we are simulating Des 2. The choice of simulation parameters for adaptation is as shown below: These simulations produced 56% overall power and 15%, 57% and 83% power 54.4 Binomial Endpoint – 54.4.4 Operating Characteristics 1101 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method conditional on being in the unfavorable, promising and favorable zones, respectively. Next we simulate Des 4 again, this time allowing the sample size to increase up to a maximum of 16,000 when conditional power falls in the promising zone. 1102 54.4 Binomial Endpoint – 54.4.4 Operating Characteristics <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This time the simulations produced 62% overall power and 15%, 80% and 83% power conditional on being in the unfavorable, promising and favorable zones, respectively. Similar simulation operations were carried out for other values of risk reduction under both the designs. Finally all these results representing the operating characteristics of both Des 2 and Des 4 conditional on the zone into which the conditional power falls at the second interim analysis, are displayed in Table 54.8. (or ) The table reveals substantial gains in power for Des 4 compared to Des 2 at all values of risk reduction if the second interim outcome falls in the promising zone, thereby leading to an increase in the sample size. Outside this zone the two designs have the same operating characteristics since the sample size does not change. If the second interim outcome falls in the unfavorable zone, the trial appears to be headed for failure and an additional sample size investment would be risky. If the second interim outcome falls in the favorable zone, the trial is headed for success without the need to increase the sample size. Thus the adaptive design provides the opportunity to increase the sample size only when the results of the second interim analysis fall in the promising zone. This is precisely when the trial can most benefit from a sample size increase. 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary 1103 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Table 54.8: Operating Characteristics of Des 2 (Group Sequential) and Des 4 (Adaptive Group Sequential) Designs Conditional on Second Interim Outcome Risk Reduction 100 × (1 − ρ) 15% 17% 20% 23% 25% Second Interim Outcome Unfavorable Promising Favorable Unfavorable Promising Favorable Unfavorable Promising Favorable Unfavorable Promising Favorable Unfavorable Promising Favorable Probability Power Conditional on Expected of Interim Second Interim Outcome Sample Size Outcome Des 2 Des 4 Des 2 Des 4 36% 24% 40% 27% 24 % 49 % 16% 20% 64% 8% 14% 78% 5% 10% 85% 15% 57% 94% 20% 64% 96% 30% 73% 98% 40% 81% 99% 48% 86% 99.6% 15% 81% 94% 20% 87% 96% 30% 93% 98 % 40% 96% 99% 48% 97% 99.6% 8000 8000 6148 8000 8000 5989 8000 8000 5726 8000 8000 5440 8000 8000 5253 8000 12098 6147 8000 11925 5989 8000 11781 5738 8000 11599 5447 8000 11443 5251 All results are based on 100,000 simulated trials 54.4.5 Adding a Futility Boundary One concern with design Des 4 is that it lacks a futility boundary. There is thus the risk of proceeding to the end, possibly with a sample size increase, when the magnitude of the risk reduction is small and unlikely to result in a successful trial. In particular, suppose that the null hypothesis is true. In that case we can show that the power (i.e., the type-1 error) is 2.5% and the expected sample size under Des 4 is 8293 subjects. It might thus be desirable to include some type of futility stopping rule for the trial. In this trial the investigators proposed the following futility stopping rules at the two interim analysis time points: 1. Stop for futility at the first interim analysis (N = 4000) if the estimated event rate for the experimental arm is at least 1% higher than the estimated event rate for the control arm 2. Stop for futility at the second interim analysis (N = 5600) if the conditional 1104 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 power, based on the estimated risk ratio ρ̂, is no greater than 20% We will implement these futility rules by simulation. To this end create Des 5 with the same LD(OF) efficacy boundaries as Des 4, but also include non-binding LD(OF) futility boundaries after selecting Des 4 in the Library and clicking the icon. The futility boundary of Des 5 is not the one we intend to use. This is not a problem, however, since East permits us to edit all the boundaries in any of the simulation tabs. Accordingly we invoke the Simulations for Des 5 and add the Sample Size Re-estimation by selecting from the Include Options button. The following screen appears. The first step is to edit the futility boundaries. The futility boundary for the first look, using the rule 1 mentioned at the beginning of this section, can be calculated manually as shown below: 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary 1105 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method se = p N = 4000 πc = 0.087 πt = 1.01πc = 0.08787 δ = πt − πc = 0.00087 πc (1 − πc )/2000 + πt (1 − πt )/2000 = 0.008932521 z = δ/se = 0.097396916 We will thus use 0.0974 as the futility boundary for the first interim look. Before we make this change and run CHW Simulations, however, we must determine the futility boundary for the second interim look, under rule 2. This is achieved by using the conditional power calculator available on the Sample Size Re-estimation tab. Click 1106 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 on the button for invoking CP Calculator. The following calculator appears. We make the following changes to this dialog box. At the top of the Input /Output section we select the radio button that indicates that conditional power will be based on the values estimated at the interim look and not based on user-defined values. We choose the radio button from the three available at the right hand side of the dialog box that specifies what it is that we wish to compute. In the present case we wish to compute the Z-statistic that corresponds to a conditional power of 0.2, and so we select the top radio button from the three that are available. Finally, we edit box for Conditional Power and enter 0.2, since this is the conditional power for which we wish 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary 1107 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method to determine the corresponding futility boundary. 1108 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Upon pressing the Recalc button, the calculator is updated. We see that the Z-statistic corresponding to the futility boundary at look 2 is equal to -1.289. We may now edit the futility boundaries at look 1 and look 2 as shown below. Click on Simulation Parameters tab and edit the boundaries as given below: 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary 1109 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method We now proceed to simulate this trial as before. The other parameters on Sample Size Re-estimation tab are set as below: The impact of the futility boundary on the unconditional operating characteristics of the Des 4 design are displayed in Table 54.9. 1110 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 54.9: Operating Characteristics of the Des 4 Design with and without a Futility Boundary Risk Reduction 100 × (1 − ρ) 0% 15% 20% 25% Des 4 with No Futility Boundary Power Expected Sample Size Des 4 with Futility Boundary Power Expected Sample Size 2.5% 8259 2.81% 63% 8265 59% 86% 7289 83% 97% 6027 95% All results are based on 100,000 simulated trials 5339 7440 6939 5928 The inclusion of the futility boundary has resulted in a dramatic saving of more than 3000 subjects, on average, at the null hypothesis of no risk reduction. Furthermore, notwithstanding a small power loss of 2-5%, the trial continues to have well over 80% power for risk reductions of 20% or more. The trial suffers a power loss of 7% if the magnitude of the risk reduction is 15%, the low end of the range of clinical interest. In this situation, however, the unconditional power is inadequate (only 63%) even without a futility boundary. To fully appreciate the impact of the futility boundary on power and expected sample size, it is necessary to study the operating characteristics of the trial conditional on the results of the second interim analysis. These results are displayed in Table 54.10. 54.4 Binomial Endpoint – 54.4.5 Adding a Futility Boundary 1111 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Table 54.10: Operating Characteristics of Des 4 Design with and without a Futility Boundary, Conditional on the Second Interim Outcome Risk Reduction 100 × (1 − ρ) 0% 15% 20% 25% Second Interim Outcome Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Unfav + Fut Promising Fav + Eff Prob. of Power Conditional on Expected Interim Second Interim Outcome Sample Size Outcome No Fut With Fut No Fut With Fut 92% 6% 2% 36% 24 % 40 % 16% 20% 64% 5% 10% 85% 0.44% 15% 64% 15% 81% 94% 30% 93% 98% 47% 98% 99.5% 0.14% 16% 64% 5% 81% 94% 10.2% 93% 98% 18% 97% 99.5% 8000 12985 6918 8000 12098 6147 8000 11781 5738 8000 11443 5251 4851 12946 6923 5705 12098 6139 5930 11746 5729 6106 11443 5245 All results are based on 100,000 simulated trials It is seen that the presence of the futility boundary does not cause any loss of power for trials that enter the promising or favorable zones at the second interim analysis. Additionally the presence of the futility boundary causes the average sample size to be reduced substantially in the unfavorable zone, moderately in the promising zone while remaining the same in the favorable zone. In effect, the futility boundary terminates a proportion of trials that enter the unfavorable zone thereby preventing them from proceeding to conclusion. It has no impact on trials that enter the favorable zone. 54.5 1112 Survival Endpoint: Lung Cancer Trial A two-arm multi-center randomized clinical trial is planned for subjects with advanced metastatic non-small cell lung cancer with the goal of comparing the current standard second line therapy (docetaxel+cisplatin) to a new docetaxel containing combination regimen. The primary endpoint is Overall Survival (OS). The study is required to have one-sided α = 0.025, and 90% power to detect an improvement in median survival, from 8 months on the control arm to 11.4 months on the experimental arm, which corresponds to a hazard ratio of 0.7. We shall first create a group sequential design for this study in East, and shall then show how the design may be improved by permitting an increase in the number of events and sample size at the time of the interim analysis. 54.5 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 54.5.1 Group Sequential Design We begin by constructing a two-look group sequential design with an efficacy boundary derived from the Lan and DeMets (1983) O’Brien-Fleming type spending function, a futility boundary derived from the γ-spending function of Hwang, Shih and DeCani (1990) with parameter γ = −5, and an interim analysis at 50% of the total information. It is required to enroll subjects over 24 months and extend the follow-up for six additional months, thereby completing the study in 30 months. We begin by using East to design a trial under these basic assumptions. First, click Survival: Two Samples on the Design tab and then click Parallel Design: Logrank Test Given Accrual Duration and Study Duration as shown below. This will launch a new input window. Enter the appropriate design parameters into the dialog box as shown below. Enter median survival times of 8 months for the Control arm and a hazard ratio of 0.7 Next, click on the Boundary Info tab. Be sure to select the nonbinding futility 54.5 Survival Endpoint – 54.5.1 Group Sequential Design 1113 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method boundary as below: Next click on the Accrual/Dropout Info tab. The Accrual Duration is 24 and the Study Duration is 30. In this trial everyone will be followed for survival until the end of the study, thus the Until End of Study entry is selected. 1114 54.5 Survival Endpoint – 54.5.1 Group Sequential Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute to complete the design. Here is the Output Summary of this design. Des 1 requires an up-front commitment of 334 events to achieve 90% power. With an enrollment of 483 subjects over 24 months, the required 334 events are expected to arrive within 30 months. An interim analysis will be performed after 167 events are obtained (50% of the total information). Under the alternative hypothesis that the hazard ratio is 0.7, the chance of crossing the efficacy boundary at the interim look is about 26% leading to an expected sample size of 454 subjects and an expected study duration of 27 months. Keeping the cursor on Des1 node, if you click on the 54.5 Survival Endpoint – 54.5.1 Group Sequential Design 1115 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method icon, you will see the following output. 54.5.2 Adaptive Design: Motivation Des1 is adequately powered to detect a hazard ratio of 0.7. It is possible however, either because the new treatment is somewhat less effective than anticipated or because of improved standard of care for patients on the control arm, that the underlying hazard ratio could be larger. If this were the case, the study would be underpowered. For example, if the true hazard ratio was 0.77, an effect that is still considered clinically meaningful, the power of a 483-subject study would drop from 90% to 67.2% as 1116 54.5 Survival Endpoint – 54.5.2 Adaptive Design: Motivation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 shown below under Des2. Thus one possibility would be to design the trial from the very beginning to have 90% power to detect a hazard ratio of 0.77. 54.5 Survival Endpoint – 54.5.2 Adaptive Design: Motivation 1117 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Such a design is displayed below as Des 3 and requires 621 events. In order to complete the trial in 30 months it would be necessary to enroll 878 subjects over 24 months with an additional 6 months of follow-up. The sponsor is either unable or unwilling to make such a large sample size commitment up-front purely on the basis of the limited prior data available on the new compound. However, since an independent data monitoring committee (DMC) will be reviewing the interim efficacy data in an unblinded fashion at 50% of the total information, the sponsor might be prepared to authorize the investment of additional resources on the recommendation this committee. In a manner analogous to the pre-specification of group sequential boundaries for early stopping, the sponsor must pre-specify to the DMC the precise data dependent rules for increasing the number of events and sample size at the time of the interim analysis. (Note, however, that these rules may be modified at the time of the interim analysis if the DMC believes it is in the best interests of the patients to modify them. The statistical methodology described in this volume permits such modifications without type-1 error inflation.) These rules are best constructed with the help of the simulation tools available in East as we now show. 1118 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 54.5.3 Adaptive Design: Construction The starting point for constructing the adaptive design is the group sequential design, Des1. This design is entirely satisfactory if the true hazard ratio is 0.7 but is unsatisfactory if the hazard ratio is 0.77, a hazard ratio that is still clinically meaningful. Designing a group sequential trial to detect a hazard ratio of 0.77, as in Des 3 above, is unfortunately not an option, for it requires too large a commitment of resources up front. It is possible, however, for the sponsor to start out with Des1, requiring only 334 events and 483 subjects, but build in the option for an increase in the number of events and subjects if the results obtained at the interim analysis are promising. The adaptive design is constructed by means of simulation. Select Des1 in the Library and click the icon. You will be taken to the following simulation input window. In addition to the four tabs appearing by default on inserting Simulations, one can add more tabs to enter information available on randomization, stratification and sample size re-estimation. This can be done by clicking the Include Options button on this right hand top corner of the screen. The Sample Size Re-estimation tab is added by clicking the appropriate option as shown above. Let us focus on five such tabs shown below. Several parameters on these 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1119 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method tabs can play vital role in simulation and adaptation of a design. The default values on the Simulation Parameters tab are those that were specified at the design stage. However, all the entries in the white cells are editable and can be used to alter the simulation parameters. Thus we could alter the Info Fraction, Cum.α spent and the Simulation Boundaries as well. or we could alter the Survival Information on the Response Generation Info tab or we could alter the Accrual and Dropout Information on the Accrual/Dropout Info tab. and so on. Suppose, for example that we wish to edit the input parameters in the Survival Information panel. The current panel displays hazard rates of 0.0866 1120 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and 0.0607 for the Control and Treatment arms, respectively, implying a hazard ratio of 0.7 We know from the design Des 1 that a hazard ratio of 0.7 will yield 90% power. But what if the true hazard ratio was 0.77? The resultant deterioration in power can be evaluated by simulation. Accordingly we shall alter the Treatment cell, containing the hazard 0.0607, by replacing it with 0.77 ∗ 0.0866 = 0.0667. The total number of simulations shall be 10000 and the screen will be refreshed after every 1000 trials. Simulation without Adaptation: Note that we have not changed any of the adaptation parameters on the Sample Size Re-estimation tab. This means we are not carrying out any adaptation at this point of time. To run 10,000 simulations with a hazard ratio of 0.77, click on the Simulate 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1121 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method button. The following simulation output is displayed. The overall power is only 65.4% suggesting that it might be useful to consider an adaptive increase in the number of events and sample size at the interim look. 1122 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The “Sample Size Re-Estimation” Tab Select CHWSim1 in the Library and click the following simulation input window. icon. You will be taken to the The impact of an adaptive increase in the number of events and sample size on power and study duration can be evaluated by simulation. Click the Sample Size Re-estimation tab. This tab contains the input parameters for performing the adaptive simulations and sample size re-estimation in the on-going trial. The Sample Size Re-estimation tab is the main location from which you will be using East to design adaptive time-to-event trials. The left hand side of this tab contains the Input Parameters for adaptive simulations and the right hand side contains two charts. 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1123 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Input Parameters for Sample Size Re-estimation This window consists of 10 input fields into which one may enter various design parameters. For a given set of design parameters, East will run a number of simulated trials as specified in the Simulation Control Info tab: On running the simulations, an entry for Simulation output gets added in the Output 1124 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Preview pane and the detailed output can be seen in the Output Summary of Simulations. The input quantities in the Sample Size Re-estimation tab are described below in detail. 1. Adaptation at: For a K-look group sequential design, one can decide the time at which conditions for adaptations are to be checked and actual adaptation is to be carried out. This can be done either at some intermediate look or after accumulating data on specified number of events or after some specified information fraction. The value of this parameter depends upon the choice of the user. If it is Look no. then this parameter can be any integer number from 1 to K − 1. If the adaptation is to be carried out after observing specified events then this parameter can be some integer between [4, No. of events at design stage] and so on. The default choice in East is look number to decide the time of adaptation. 2. Max Number of Events if Adapt : This quantity is a multiplier with value ≥ 1 for specifying the upper limit (or cap) on the increase in the number of events, should an adaptive increase be called for based on the target conditional power. Notice that, in keeping with the FDA Guidance on Adaptive Clinical Trials (2010), East does not permit an adaptive decrease in the number of events. Therefore multipliers less than 1 are not accepted in this cell. For example, if you use the multiplier 1.5 and if adaptation takes place, the modified number of events is capped at 501. The 501-event cap becomes effective only if the increased number of events (as calculated by the criteria of cells 4, 5 and 6) exceed 501. 3. Max Subjects if Adapt : This quantity is a multiplier with value ≥ 1 for specifying the upper limit (or cap) on the number of subjects to be enrolled in the study. Although the power of the trial is determined by the number of events 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1125 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method and not the number of subjects, the number of subjects play a role in determining how long it will take to observe the required number of events, and hence for determining the study duration. The number of subjects may only be increased, never decreased. Therefore multipliers less than 1 are not accepted in this cell. For example, if you use the multiplier 1.5 and if adaptation takes place, the modified number of subjects is capped at 724 subjects. The trial will continue to enroll subjects until either the required number of events is reached or the cap on the number of subjects is reached. 4. Upper Limit on Study Duration : An event driven trial ordinarily continues until the required number of events arrive. This input parameter is provided merely as a safety factor in order to prevent the trial from being prolonged excessively should the required number of events be very large or their rate of arrival be very slow. Its default value is set at three times the expected study duration obtained from the initial design of the trial. Consequently, if the scenarios being simulated are realistic, the required number of events will almost always be attained much before this upper limit parameter becomes operational. It is recommended to leave this parameter unchanged at least for the initial set of simulation experiments since it would interfere with the operating characteristics of the study if it were to become operational. 5. Target Conditional Power for Re-estimating Events : This parameter ranges between 0 and 1 and is the target conditional power desired at the end of the study. Suppose, for example that the Target CP is set at 0.9. Let the value of the test statistic obtained in the current simulation be zL at look L, where an adaptive increase in the number of events is being considered. 1126 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then, by setting the left hand side of equation (54.21) to 0.9 we have: ( 0.9 = 1 − Φ bK r DL 1+ − zL DK − DL r ) p p DL ∗ − δ r(1 − r) DK − DL . DK − DL (54.23) ∗ we obtain the increased number of events Upon solving equation (56.11) for DK that are needed to achieve the target conditional power of 0.9 in this simulation. Let us illustrate with Des 1. In Des 1 K = 2, L = 1, r = 0.5 and the critical value for declaring statistical significance at the end of the trial is b2 = −1.9687, as can be seen by examining the stopping boundaries displayed in the Simulation Parameters tab. The interim analysis is performed when D1 = 167 events are obtained. In the absence of any adaptive change, the trial will terminate when D2 = 334 events are obtained. Suppose the current simulation generates a value z1 = 1.5 for the logrank statistic at look 1. Since the target conditional power is 0.9, equation (56.11) takes the form ( ) r r p 167 167 ∗ 0.9 = 1−Φ −1.9687 1 + − 1.5 − 0.5δ D2 − 167 . 334 − 167 334 − 167 (54.24) In order to evaluate D2∗ , however, it is necessary to specify a value for the log hazard ratio δ in equation (56.12). This parameter is of course unknown. East gives you the option to perform simulations with either the current estimate δ̂1 or to use the value of δ specified under the alternative hypothesis at the design stage. The choice can be made by selecting Estimated HR or Design HR from a drop-down list of the quantity CP Computation Based on of the Sample Size Re-estimation tab. ˆ 1 ) and we The default value is Estimated HR, (or equivalently δ̂1 = ln HR recommend using this default until you have gained some experience with the simulation output and can judge for yourselves which option provides better operating characteristics for your studies. East uses the formula δ̂1 = p z1 r(1 − r)D1 to obtain the current estimate of δ. Upon substituting z1 = 1.5, D1 = 167 and r = 0.5 in the above expression we obtain δ̂1 = 0.232, or equivalently a hazard ratio estimate of exp(0.232) = 1.2611. Substituting the estimate of δ̂1 into equation (56.12) and solving for D2∗ yields D2∗ = 656. Since the maximum number of events has been capped at 501, this simulation will terminate the trial when the number of events reaches 501 instead of going all the way to 656 events. In this case the desired target conditional power of 0.9 will not be met. 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1127 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Indeed in this case the conditional power (with δ̂1 being used in place of the unknown true δ) is only ( ) r √ 333 167 − 1.5 − 0.5δ 500 − 167 = 0.798 1+ 333 − 167 333 − 167 r 1 − Φ 1.9687 For a more detailed discussion of conditional power, including the use of a special conditional power calculator that computes conditional power accurately without relying on the approximate assumption that the next look will be the last one, see Chapter 57. 6. Promising Zone Scale : Promising Zone is such that the number of events will only be increased if the conditional power at the interim look falls in this zone. East asks you to select the scale on which the promising zone is to be defined. It can be defined based on the conditional power or the test statistic or the estimated effect size and should be specified by entering the minimum and maximum of these quantities. Let us go ahead with the default option which is Conditional Power. 7. Promising Zone – Min CP : In this cell you specify the minimum conditional power (in the absence of any adaptive change) at which you will entertain an increase in the number of events. That is, you specify the lower limit of the promising zone. 8. Promising Zone – Max CP : In this cell you specify the maximum conditional power (in the absence of any adaptive change) at which you will entertain an increase in the number of events. That is, you specify the upper limit of the promising zone. Suppose, for example, that the number of events is only increased in a promising zone specified by the range 0.45 ≤ CP < 0.8, and suppose that in that case, the number of events is re-estimated so as to achieve a target conditional power of 0.99. Then the Input Parameters Table will contain the entries shown below. The zone to the left of the promising zone (CP < 0.45) is known as the 1128 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 unfavorable zone. The zone to the right of the promising zone (CP ≥ 0.8) is known as the favorable zone. In a group sequential design that includes early stopping boundaries for futility and efficacy, the unfavorable zone contains within it an even more extreme region for early futility stopping and the favorable zone contains within it an even more extreme region for early efficacy stopping. 9. HR Used in CP Computations: In this cell you specify whether the simulations should utilize conditional power based on δ̂L estimated at the time of the interim analysis or should utilize the value of δ specified under the alternative hypothesis, in equations (54.21) and (56.11). The adaptive design will have rather different operating characteristics in each case. The default is to use the estimated value δ̂L . 10. Accrual Rate After Adaptation : East gives you the option to alter the rate of enrollment after an adaptive increase in the number of events. This feature would be useful, for example, to evaluate the extent to which the follow-up time and hence the total study duration can be shortened if the rate of enrollment is increased after the adaptive change is implemented. 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1129 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Required Events Chart The upper chart at the extreme right of the Sample Size Re-estimation tab is called the Required Events Chart. The X-axis of this chart, labeled CP(Dsgn.Events, Est.HR), tracks the conditional power obtained at the interim look based on the total number of events DK specified at the design stage (334 events under Des 1) and the interim estimate δ̂L of the log hazard ratio. To be specific, ( = 1 − Φ bK r 1+ DL − zL DK − DL r CP(Dsgn.Events, Est.HR) ) p p − δ̂L r(1 − r) DK − DL (54.25). DL DK − DL Since δ̂L and zL are related through the relationship zL δ̂L = p , r(1 − r)DL equation (54.25) shows that there is a one-to-one correspondence between CP(Dsgn.Events, Est.HR), δ̂L and zL . It is thus reasonable to use any one of these three variables on the X-axis of the Required Events Chart. We have chosen CP(Dsgn.Events, Est.HR) because it has a natural interpretation that is easily understood by non-statisticians. The Y-axis, labeled Required Events displays the number of events that are required to complete the trial. This number is computed as the minimum of the re-estimated number of events and the cap on the maximum number of events. To be specific, let Dmax be the maximum number of events permitted if an adaptation occurs. (This is the entry to the right of the multiplier in cell 1 of the Input Parameters Table. ) Let ∗ DK be the solution to the equation ( Target CP = 1−Φ bK r DL 1+ − zL DK − DL r ) p p DL ∗ − δ̂L r(1 − r) DK − DL , DK − DL (54.26) where Target CP is the entry in cell 4. 1130 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then ∗ RequiredEvents = min(Dmax , DK ) We will illustrate with a couple of examples. Example 1: Suppose the input parameters are as displayed below: With these inputs the Required Events will be re-computed for values of CP(Dsgn.Events, Est.HR) that fall in the promising zone, specified by 0.45 ≤ CP(Dsgn.Events, Est.HR) < 0.8. For all values of CP(Dsgn.Events, Est.HR) outside this zone, the Required Events will remain the unchanged at 334, the number specified at the design stage. Inside the promising zone, however, East will re-estimate D2∗ , the number events that are needed to achieve the target conditional power of 0.8 displayed in cell 4, using equation (54.26). It can be shown that for values of CP(Dsgn.Events, Est.HR) on the X-axis between 0.45 and 0.58, the value of D2∗ needed to boost the conditional power to the 0.8 target exceeds 501. Since the cap on the number of events is set at Dmax = 501, East will set Required Events = min(501, D2∗ ) = 501 in the chart for all 0.45 ≤ CP(Dsgn.Events, Est.HR) ≤ 0.58. However, at values of CP(Dsgn.Events, est.HR) on the X-axis that exceed 0.59, the re-estimated number 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1131 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method of events D2∗ is less than 500, and hence the Required Events gradually drops down until it reaches 334 at CP(Dsgn.Events, Est.HR) = 0.8. Thereafter the Required Events remains constant at 334. Thus the shape of the Required Events Chart is as shown below. The shape of the Required Events Chart depends on the value of the target conditional power that is one of the inputs. To see this, consider the next example. 1132 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Example 2 : Suppose the input parameters are as displayed below: This time the promising zone ranges from 0.3 to 0.9. The target conditional power (Shape Parameter) is 0.99. It can be shown that more than 501 events (the cap in cell 1) will be needed to reach this target, for all values of CP(Dsgn.Events, Est.HR) in the promising zone. Therefore the Required Events Chart will be a step function taking on values 334 outside the promising zone and taking on values 501 inside the promising 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1133 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method zone. Thus by entering different target conditional power values as input in cell 4 and pressing Refresh Charts button, you can experiment with different shapes on the Required Events Chart. The step function shape is favored in many trials both for its simplicity and because it prevents ”reverse engineering” the precise value of CP(Dsgn.Events, est.HR) by anyone who, for regulatory reasons, has to remain blind to the interim results. For example, suppose it is known that the number of events has increased from 334 to 501. Even then all one can conclude is that CP(Dsgn.Events, est.HR) falls between 0.3 and 0.9. 1134 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Conditional Power Chart The lower chart on the right side panel of the Sample Size Re-estimation tab is called the Conditional Power Chart. As the name suggests, this chart plots the actual conditional power of the study given the observed data at the interim analysis. As was the case for the Required Events Chart, the data at interim analysis results are summarized in terms of CP(Desgn.Events, Est.HR) and displayed on the X-axis. The Y-axis, titled CP(Req.Events, Ref.HR) then plots the actual conditional power for the reference hazard ratio contained in the edit box below this chart, where Req.Events refers to the Required Events displayed in the chart above the conditional power chart. Consider again the inputs that were entered into the input parameter table in Example 2. For these inputs the conditional power chart looks as shown below if the Reference HR 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1135 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method is equal to 0.77. The chart shows that true conditional power gradually climbs from below 20% to about 50% in the unfavorable zone (CP(Dsgn.Events, Est.HR) < 0.3). The true conditional power receives a substantial boost in the promising zone (0.3 ≤ CP(Dsgn.Events, Est.HR) < 0.9), because the Required Events jump from 334 to 501 in this zone. Now the conditional power climbs from slightly below 80% to slightly above 90%. There is a slight decline in the true conditional power upon entering the favorable zone CP(Dsgn.Events, Est.HR) ≥ 0.9, for now the Required Events drop back to 334. However in this zone the true conditional power starts out at 82% and rapidly climbs up to well over 90%. The conditional power chart is useful because it provides a good idea of the type of power one can expect, conditional on falling in the unfavorable, promising and favorable zones, even before any simulations are performed. The simulation results, to be discussed next, provide additional insights. 1136 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table of Simulation Results by Zone We have already seen in the Simulaton Outputs Without Adaptation that if the underlying hazard ratio is 0.77 and there is no adaptive change to the number of events then the study only has about 66% power. The power can be improved by increasing the number of events. The traditional approach is to commit up-front to an increase in the number of events. This was the approach used for creating Des 3. We saw in Section 54.5.2 that while Des 3 does indeed have 90% power to detect a hazard ratio of 0.77, it requires a considerably larger up-front commitment of resources; 539 events to be obtained from 823 subjects enrolled over 24 months with 6 additional months of follow-up. A commitment of this magnitude based solely on limited phase 2 data from other trials was not feasible for the sponsor of the current study. We now consider an alternative approach that has lower overall power than Des 3 under a hazard ratio of 0.77, but might be more acceptable to the sponsor. This is the adaptive approach in which the commitment of resources occurs in two stages with the second stage commitment forthcoming only if the first stage results are in the promising zone. We will evaluate the operating characteristics of this approach by generating 10,000 simulated trials. Enter the following Accrual / Dropout Information and Survival Information in the 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1137 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method respective tabs of the Simulation Input tabs. and the following values in the Sample Size Re-estimation tab. 1138 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 These inputs imply that the data for each of the 10,000 simulated trials will be generated from exponential distributions with a hazard ratio of 0.77, with patients arriving at the rate of 20.08/month, and no drop-outs. At the interim analysis, when 167 events have been observed, there will be an increase of resources only if the stage 1 conditional power lies in the promising zone (between 0.3 and 0.9). In that case, the maximum number of events will increase by 50%, from 334 to 501, and the maximum number of subjects will also increase by 50%, from 483 to 724. To run the simulations, click the Simulate button. Save the overall Simulation output in the Library. This will get saved as CHWSim1. The Table of Simulation Results by Zone gets filled in and is displayed below as Figure 54.1 Figure 54.1: Simulation Results for 10,000 Trials of Des 1with Adaptation at Look 1 This table displays five rows for tracking the outcomes of the 10,000 simulated clinical trials zone by zone, plus a sixth row that combines the results across all five zones. The entries in the table are self-explanatory. For comparison purposes run the simulations again, this time without adaptation. One simple way to do this is to set the two multipliers equal to 1. Edit the Simulation node CHWSim1 by selecting it and clicking the icon. Make the two multipliers equal to 1 as shown below: The results from 10,000 simulations are re-computed, this time without any adaptation of events or sample size and are displayed below as Figure 54.2. 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1139 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Figure 54.2: Simulation Results for 10,000 Trials of Des 1 without Adaptation at Look 1 Figure 54.1 displays the simulation based operating characteristics of the Des 1 with the adaptive option enabled while Figure 54.2 displays corresponding operating characteristics of Des 1 with the adaptive option disabled. Although Des 1 was designed under the optimistic assumption that the true hazard ratio is 0.7, both sets of simulations are performed under the pessimistic assumption that the true hazard ratio is 0.77. In order to conveniently compare the operating characteristics of the non-adaptive and adaptive designs, we have combined the relevant data from Figures 54.1 and 54.2 into a single table, Table 54.11. Table 54.11: Operating Characteristics of Optimistic Design (Powered to Detect HR=0.7) under the Pessimistic Scenario (true HR=0.77) 10,000 Simulations Under the Pessimistic Scenario that HR = 0.77 Zone Unf+Fut Prom Fav+Eff Total P(Zone) 29% 34% 37% — Power NonAdpt Adapt 30% 31% 68% 86% 93% 92% 66% 72% Duration (months) NonAdpt Adapt 27.8 27.8 29.3 33.9 26.3 26.3 27.7 29.3 # of Subjects NonAdpt Adapt 468 468 483 724 452 451 467 550 The fourth row of Table 54.11 displays the overall simulation results combined across all zones. The non-adaptive design has 66% power, average study duration of 27.7 months and an average sample size of 467 subjects. In contrast the adaptive design boosts the power by 7 percentage points to 73%, but requires average study duration of 29.34 months and an average sample size of 550 subjects. This is to be expected. If additional study duration and sample size resources are allocated to a trial, its power must increase. It is more instructive to compare the results Table 54.11 by zone rather than overall. In 1140 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 this type of comparison it is seen that the adaptive and non-adaptive designs behave identically (up to Monte Carlo accuracy) in the Unfavorable+Futility zone as well as in the Favorable+Efficacy zone. Both designs end up in the Unfavorable+Futility zone about 29% of the time and in that case both designs have similar power of about 30% with identical average study duration, and average number of subjects. Again, both designs end up in the Favorable+Efficacy zone about 37% of the time, and in that case they have about 93% power with practically identical average study duration and average number of subjects. In other words the adaptive design produces the same power and consumes the same resources as the conventional design if the interim result falls in either of these two zones. However, 34% of the time both designs end up in the Promising: 0.3≤CP<0.9 zone, and where the adaptive design produces about 86% power whereas the non-adaptive design produces only 68% power. To be sure the adaptive design consumes more resources in the promising zone (study duration = 34 months versus 29.3 months; average events = 501 versus 334; average number of subjects = 724 versus 483), but these additional resources are worth spending since they can boost the power by about 20% and might make all the difference between a successful trial and a failure. In summary the adaptive design calls up the additional event and sample size resources only when they are needed and not otherwise. Although the tables in Figure 54.1 and Figure54.2 have partitioned the simulation results into three zones there are in fact five zones in the East output. The simulations in the Unfavorable+Futility zone are further separated into those simulations that were terminated for futility at the interim analysis and those that were unfavorable but did not cross the futility boundary. Similarly the simulations in the Favorable+Efficacy zone are further separated into those that crossed the efficacy boundary at the interim look and those that were favorable but did not cross the efficacy boundary. The Table of Simulation Results by Zone as seen above for adaptive design: Examined in this way, it is seen that of the 2912 simulations entering the 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1141 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method Unfavorable+Futility zone, only 362 (3.62%) stop early for futility. Of the 3663 simulations entering the Favorable+Efficacy zone, only 996 (9.96%) cross the efficacy boundary and stop early. Table of Zone wise Percentiles The Table of Simulation Results by Zone reports only the average number of events, sample size, accrual duration and study duration. One can examine the percentiles of the distributions of these statistics from the Table of Zone-Wise Percentiles. Double click the node named CHWSim2 to see the detailed simulation output for adaptive 1142 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design. By default this table displays the 5th, 25th, 50th, 75th and 95th percentiles of the relevant distributions for all 10,000 trials. For example, the 95th percentile of the Study Duration for all Trials is displayed as 38.14 months. 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1143 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method The 95% percentile of the study duration for trials that enter the promising zone, and therefore adapt, is 35.4 months. It might be of interest to know that how short and how long the study duration could be among the simulations that have entered the promising zone. To see this one may edit the Percentile column of small table named Output for all Trials on Simulation Control Info tab and run the simulations again. Observe the Promising zone table. The 0.1 percentile of the study duration is 32 months while the 99.9 percentile is 37 months. East also provides the capability to store the summary statistics for every simulation run and the subject level data. The is achieved by checking off the following checkboxes on the Simulation Control Info tab. When we keep this simulation output in the Library, two more nodes get saved under the simulation node as shown below: 1144 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Simulating Multiple Scenarios The simulations that were performed in the earlier section were based on a multiplier of 1.5 for the maximum number of subjects, if an adaptation were to occur (cell 3 in the Table of Input Parameters). The choice of 1.5 was arbitrary. It is possible that a smaller multiplier might result in almost the same average study duration, and hence produce a more efficient design from the sponsor’s perspective. It would therefore be desirable to conduct several simulation experiments with different multipliers for the number of subjects. It is possible to conduct such multiple experiments from the Sample Size Re-estimation tab. Edit the Simulations and click on Sample Size Re-estimation tab. The inputs on this tab can be used to conduct simulation experiments over a range of multipliers for the maximum number of subjects, while keeping the multiplier for an adaptive increase in the number of events constant. The magnitude of the multiplier applied to the maximum number of subjects does not affect the power of the study but it does have a direct impact on the study duration. It is thus preferable to experiment with a range of multipliers so as to gain a better understanding of the relationship between maximum number of subjects and study duration in an adaptive design. Suppose we wish to conduct simulation experiments over a range of sample sizes, with the Max. Events if Adapt multiplier fixed at 1.5. 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1145 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method We may enter a range of multipliers for sample size into the field for Max. Sample size if adapt using the convention x : y : z to denote entries ranging from x to y in steps of size z. Let us enter multiplier values for sample size ranging from 1.25 to 1.9 in steps of size 0.05. The complete input table will look like: Upon pressing the Simulate button, all the scenarios are simulated and can be seen in the Output Preview pane. The above simulation output displays results for all 10,000 simulated trials. The column Power contains the overall power based on 10,000 simulated trials. It might be of greater interest to examine the results only for those trials that entered the promising zone and hence were adapted. The above simulation output also has some columns which correspond to the promising zone. These columns are Power 1146 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (Promising), Average Study Duration, Average Sample Size and Average Events. These same simulation results for promising zone are also displayed graphically on the three charts shown below. We have performed ten simulation runs with the Maximum Number of Subjects if Adapt input parameter ranging from 603 to 821. Let us analyze these outputs. Power The Power (Promising) column show a relatively constant power of about 87% for the entire range of proposed values for Maximum Number of Subjects if Adapt. This is what one would expect in an event driven trial. The mild fluctuation in power that are observed are due to Monte Carlo sampling error. Average Number of Events The required number of events for trials that enter the promising zone is determined by the Target CP. Since the value of this parameter has been set to 0.99, the Required Events Chart displayed in the Sample Size Re-estimation tab is a step function with the 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1147 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method constant value 501 for all value of conditional power in the promising zone. Consequently the Average Number of Events for all trials in the promising zone is 501 regardless of the value of the Maximum Number of Subjects if Adapt parameter. Average Sample Size Observe from the table of simulation results that the numbers in the Maximum Sample Size if Adapt column and the Average Sample Size column are close to one another between 604 and 748. Thereafter the Average Sample Size level off to a constant value of 750 even though the Maximum Sample Size if Adapt continue to grow. The same behavior is evident in the Number of Subjects Chart which displays a 45 degree line for values between 603 and 748 on the X-axis and a horizontal line thereafter. This is so because the time that it takes for the 748 subjects to be enrolled is about 37 months and that is about the same as the average time that it takes for the required 501 events to arrive. Once 501 events have arrived, additional enrollment stops. Thus values on the Y-axis of the Number of Subjects Chart do not change after an average enrollment of 748 subjects. Average Study Duration As the magnitude of the Maximum Number of 1148 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Subjects if Adapt parameter increases, the Average Study Duration decreases. This is so because with increased accrual the required 501 events arrive earlier. Notice, however, from the Study Duration and Accrual Duration Chart as well as from the tabulated values in the Average Study Duration column that the rate of decrease in the average study duration continues to decline until it gradually comes to a halt at a value between 724 and 748 for the Maximum Number of Subjects if Adapt value on the X-axis. Thereafter the Average Study Duration value remains constant at 37 months even though the Maximum Number of Subjects if Adapt value continues to increase. This is so because on average by the time about 748 subjects have enrolled, the required 501 events will have arrived and the trial will be terminated. Average Accrual Duration As the magnitude of the Maximum Number of Subjects if Adapt parameter increases, the Average Accrual Duration increase as well since more subject are being enrolled while the rate of accrual is constant. However, as seen from the Study Duration and Accrual Duration Chart, the rate of increase in Average Accrual Duration continues to decline until it comes to a halt at a value close to 748 for the Maximum Number of Subjects if Adapt value on the X-axis. Thereafter the Average Accrual Duration value remains constant at about 37 months even though the Maximum Number of Subjects if Adapt value continues to increase. This is so because on average by the time about 748 subjects have enrolled, the required 501 events will have arrived and further enrollment will be halted. Indeed, as can be seen on the Study Duration and Accrual Duration Chart, the graphs of Average Study Duration and Average Accrual Duration begin to converge and meet at a value of 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction 1149 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method about 772 subjects on the X-axis and 37 months on the Y-axis approximately. 1150 54.5 Survival Endpoint – 54.5.3 Adaptive Design: Construction <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 54.5.4 Interim Monitoring Now we will discuss the CHW interim monitoring procedure taking the example of Des 1. Select Des 1 in the Library and click on the icon to create a CHW Interim Monitoring Dashboard for this design as shown below. This dashboard differs from the usual interim monitoring dashboard for a classical group sequential trial in the following major ways: The Pre-specified Nominal Critical Points (stopping boundaries) are written into dashboard as soon as it is created, and are non-editable. Incremental Statistic value is derived at each look from Cumulative Events and Cumulative Statistic values of that look and the previous look, except at the first look, the Incremental Statistic value remains same as the Cumulative Statistic value. The weighted statistic is obtained by combining the incremental test statistics using Pre-specified Weights. In actual trial, the cumulative events at each look need not correspond to what was originally specified at the design stage. But if the cumulative events that correspond to the original study design are entered, then the weighted statistic is the same as the usual Wald statistic employed in conventional (non-adaptive) interim monitoring. ∗ The values of cumulative test statistics Zj,cum at the interim look j are calculated by clicking on the Enter Interim Data button. This calculator uses as an input the estimates of the treatment effect δ̂j and estimated value of the standard error of δ̂ . These values may be obtained by fitting a Cox proportional hazard model to the dataset available at look j or by calculating the Z-score based on the log-rank test statistics 54.5 Survival Endpoint – 54.5.4 Interim Monitoring 1151 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method ∗ Zj,LR and using the following (approximating) expressions ∗ Zj,LR δ̂j∗ = q r(1 − r)Dj∗ V ar δ̂j∗ = 1 r(1 − r)Dj∗ (54.27) (54.28) Here r is the proportion of subjects randomized to the active treatment group and Dj∗ is the number of events observed at the look j. Example: IM Inputs taken from the results of Cox proportional hazards model Suppose the first look is taken as planned after an accrual of 167 events. Suppose we observe δ̂ = −0.288 and a standard error of 0.236. The cumulative statistic at the first look is thus (−0.288/0.236) = −1.220. We enter these quantities into the Test statistic calculator as shown below. On pressing OK, the IM dashboard is updated with the first look computation. 1152 54.5 Survival Endpoint – 54.5.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Since the nominal critical value for early stopping is -2.963, the trial continues. We now need to decide on the sample size to use for the second and final look. We invoke the conditional power calculator to assist with this decision. Suppose we specify to the calculator that we wish to obtain 90% conditional power to detect HR=0.75. Upon entering these terms into the calculator we obtain a final 54.5 Survival Endpoint – 54.5.4 Interim Monitoring 1153 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method (overall) tally of 558.263 events. Based on the guidance provided by the calculator, suppose we decide to continue the trial to observe 560 events by suitably increasing the sample size and the study duration. Suppose that, based on these 560 events, the estimate of delta is −0.272 corresponding to a HR value of 0.762 and the estimate of standard error of δ̂ as 0.135, leading to 1154 54.5 Survival Endpoint – 54.5.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 cumulative test statistic of −2.015. Upon pressing the OK button the cumulative statistic is entered into the interim monitoring dashboard, the incremental statistic and the weighted statistic are computed as -1.605 and -1.998 respectively. Since the weighted statistic exceeds the nominal critical value, the null hypothesis is rejected. The repeated confidence interval for HR is 0.697,0.996) and the repeated p-value is 0.023. These estimates are based on the methods described in Section 54.1 and are appropriately adjusted to preserve their validity in the face of adaptive sample size changes. 54.5 Survival Endpoint – 54.5.4 Interim Monitoring 1155 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method The above computations in the IM sheet were carried out using the formulas specified in section 54.2 as detailed below. At the first look, δ̂1∗ 1 ∗ = −1.22 δ̂1∗ = −0.288, SE(δ̂1∗ ) = 0.236, I1∗ = 2 = 17.955, Z1,cum = ∗ S Ê( δ̂1∗ ) [S Ê(δ̂1 )] By definition, for the first look, the incremental statistic and the weighted statistic are ∗ ∗ = Z1,cum = −1.22 Z ∗(1) = Z1,CHW At the second look, δ̂2∗ 1 ∗ δ̂2∗ = −0.272, SE(δ̂2∗ ) = 0.135, I2∗ = = 2 = 54.870, Z2,cum = S Ê(δ̂2∗ ) [S Ê(δ̂2∗ )] −2.015 The incremental √ ∗ ∗ statistic √ at∗the second look is I2 Z2,cum − I1∗ Z1,cum ∗(2) √ = −1.605 Z = ∗ ∗ I2 −I1 The weighted√ statistic √ (1) Z ∗(1) + w (2) Z ∗(2) ∗ = w √ = Z2,CHW (1) (2) w +w √ √ 0.5(−1.22)+ 0.5(−1.605) √ 0.5+0.5 = −1.998 Example: IM Inputs taken from the results of Logrank test Suppose the first look is taken after an accrual of 160 events. Further we apply Logrank test to the data, and obtain the value of χ21df to be 1.456 or equivalently √ Z1∗ = 1.456 = 1.2066. The cumulative statistic at the first look is thus 1.2066. We will first estimate δ̂ and SE(δ̂) using the approximation formulas 54.27 and 54.28 and then use the test statistic calculator to post these values. Thus using the formulas, Z∗ 1.2066 δ1∗ = √ 1 ∗ = √ 1.2066 = 6.3246 = 0.1908; r(1−r)D1 0.5(1−0.5)160 √ 1 1 ∗ V ar(δ1∗ ) = r(1−r)D 0.025 = 0.1581. ∗ = 0.5(1−0.5)160 = 0.025, SE(δ1 ) = 1 Another way to estimate δ1∗ is δ1∗ = (Z1∗ )(SE(δ1∗ )) = (1.2066)(0.1581) = 0.1908. Now bring up CHW-IM dashboard, select the first look row and click on the Enter Interim Data button to input the look-wise information. Enter Cumulative Events as 160. Enter the value of δ̂ as -0.1908, the value of SE(δ̂) as 0.1581 and click on Recalc and then on OK. The values in the IM sheet for the first look will appear as shown below. The values of cumulative, incremental and weighted statistics are all same as -1.207. Since the nominal critical value for early stopping is -2.963, the trial continues. We 1156 54.5 Survival Endpoint – 54.5.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 now need to decide on the sample size to use for the second and final look. We invoke the conditional power calculator to assist with this decision. Suppose we specify to the calculator that we wish to obtain 90% conditional power to detect HR=0.75. Upon entering these terms into the calculator we obtain a final 54.5 Survival Endpoint – 54.5.4 Interim Monitoring 1157 <<< Contents 54 * Index >>> The Cui, Hung and Wang Method (overall) tally of 555.0 events. Based on the guidance provided by the calculator, suppose we decide to continue the trial to accrue 560 events by suitably increasing the number of subjects and the study duration. Suppose that, based on these 560 events, the estimate of Z2∗ from Logrank test is -2.135. Now as in the first look, we can estimate SE(δ̂) and δ̂ using the formulas 54.27 and 54.28. These estimates work out as SE(δ̂2∗ ) = 0.0845, the default value that appears in the test statistic calculator and δ̂2∗ = (SE(δ̂2∗ ))(Z2∗ ) = (0.0845)(2.135) = 0.1804. Enter these values in the CHW IM dashboard. Now the cumulative statistic is entered into the interim monitoring dashboard, the incremental statistic and the weighted statistic are computed as -1.7624 and -2.0995 respectively. Since the weighted statistic exceeds the nominal critical value, the null hypothesis is rejected. The repeated confidence interval for HR is (0.6921, 0.9887) and the repeated p-value is 0.0182. These estimates are based on the methods described in Section 54.1 and are appropriately adjusted to preserve their 1158 54.5 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 validity in the face of adaptive sample size changes. 54.5 Survival Endpoint 1159 <<< Contents * Index >>> 55 The Chen, DeMets and Lan Method Two objections are sometimes leveled at the CHW method discussed in Chapter 54. They both relate to the use of the CHW statistic (54.2) instead of the classical Wald statistic (54.3) or its variant (54.4) for performing the hypothesis tests. Specifically, it is felt by some statisticians that the incremental Wald statistics (Z ∗(1) , Z ∗(2) , . . . Z ∗(K) ) generated at the K stages should be combined by utilizing weights derived from the actual sample sizes (n∗1 , n∗2 , . . . n∗K ) at each stage rather than by weights that depend on the pre-specified sample sizes (n1 , n2 , . . . nK ). There is a concern that if the actual number of subjects entering the trial differs from the number pre-specified at the start of the trial, then the use of pre-specified weights will distort the scientific contribution of each cohort entering the trial. This is a philosophical rather than statistical objection, since the use of pre-specified weights controls the type-1 error in the presence of sample size changes, whereas the use of actual weights, in general, does not . It has, however, led to some interesting theoretical research on the loss of efficiency resulting from use of the CHW statistic. (See, for example, Tsiatis and Mehta, 2003; Jennison and Turnbull, 2006). In practice, the magnitude of the adaptive sample size increase is seldom greater than two-fold and within this limit, the loss of efficiency is rather small. Indeed some of the EastAdapt tools described in the present chapter will show that in most practical settings, the loss of efficiency is negligible. This chapter discusses a method proposed by Chen, DeMets and Lan (2004) (the CDL method) for making sample size modifications to an ongoing trial and then performing the interim monitoring and final analysis with the classical Wald statistic rather than the weighted CHW statistic. The method is further extended to a more general setting by Gao, Ware and Mehta (2008) (the extended CDL method). The main limitation of these two methods is that they are only applicable if the sample size is altered at the penultimate stage of a K-stage group sequential trial. Thus, for simplicity, we will illustrate the methods for two-stage trials only. Furthermore, in the current implementation of East, they are only applicable if the sample size is increased adaptively, but not if it is decreased. This chapter pre-supposes familiarity with the CHW method and examples presented in Chapter 54. The same three designs, normal (schizophrenia example), binomial (acute coronary syndromes example) and survival (lung cancer example), that were used to illustrate the CHW method in Chapter 54 will be re-visited in the present chapter. Thus some of the steps used to construct these designs in East may be skipped since they will have already been presented in Chapter 54. 1160 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 55.1 The CDL Method 55.1.1 Normal Endpoint 55.1.2 Binomial Endpoint 55.1.3 Survival Endpoint Consider a two-sided level-α test of the null hypothesis H0 : δ = 0 versus the two-sided alternative hypothesis H1 : δ 6= 0 for a two-arm randomized clinical trial. We assume that the null hypothesis will be tested by a two-look group sequential trial with cumulative sample sizes (n1 , n2 ) and stopping boundaries (b1 , b2 ) derived from some level-α spending function. The data will be examined at the end of look 1 and the sample size for the remainder of the trial may then be changed. Ordinarily if the sample size is changed in a data dependent manner in the middle of a trial, we would be obliged to use the CHW weighted statistic (54.2) described in Chapter 54 instead of the conventional Wald statistic (54.3) for the final analysis, in order to preserve the type-1 error. Intuitively, however, it could be argued that if under the null hypothesis the interim value of the test statistic is large, then it would stand a better chance of regressing to the mean if the sample size of the second stage was increased. Therefore a sample size increase would make it more difficult to achieve statistical significance at the final analysis. Chen, DeMets and Lan (2006) have formalized this intuition by demonstrating mathematically that if the conditional power at the interim look, evaluated at the estimated value δ̂ obtained at the interim analysis, is at least 50%, one can increase the sample size for the remainder of the trial and still use the conventional Wald statistic for the final analysis, and the type-1 error won’t be inflated thereby. This important result makes it possible to design two-stage adaptive trials in which the sample size may be increased in a data dependent manner at the interim look, but all the conventional methods of obtaining p-values, confidence intervals and point estimates, available in standard software packages, are applicable at the time of the final analysis. The above CDL result applies only to a sample size increase and not to a sample size decrease. In order to use the conventional statistic under a sample size decrease the reverse condition must hold. That is, if the conditional power is no greater than 50% at the interim look, the sample size can be decreased and the conventional Wald statistic can be used for the final analysis without inflating the type-1 error. However, the discussion in this chapter focuses on sample size increases only. This is entirely in keeping with the recommendations in the FDA Guidance on Adaptive Design (2010) where the use of adaptive methods to decrease sample size is discouraged. 55.1 The CDL Method 1161 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method 55.1.1 Normal Endpoint: Schizophrenia Trial We will apply the CDL method to the Schizophrenia trial discussed in detail in Chapter 54. The starting point is a two-look design enrolling 442 subjects, with an interim look planned after obtaining data on 208 completers. The trial is designed to test the null hypothesis δ = 0 versus the one-sided alternative that δ > 0. The standard deviation is assumed to be σ = 7.5. As the only purpose of the interim analysis is to re-estimate the sample size, but not to stop early, we use the conservative γ(−24) spending function (Hwang, Shih and DeCani, 1990) to obtain the efficacy stopping boundary for the interim look. Thereby the amount of type-1 error spent at the interim look is negligible and practically the entire α = 0.025 is available for the final analysis. With these specification the trial has just over 80% power to detect δ = 2. As pointed out in Chapter 54, the true value of δ which might actually be less than 2. It is thus possible that this trial is underpowered at a sample size of 442. We can, however, examine the data at the interim look and estimate the conditional power, and increase the sample size if the conditional power falls in a promising zone. The approach is identical to that discussed in Chapter 54 for the CHW design. We partition the sample space into following zones - futility, unfavorable, promising, favorable and efficacy, based on the conditional power attained at the interim look. The sample size may then be increased if the interim results fall inside the promising zone, thereby recovering the lost power. The additional feature of the CDL design is, however, that if conditional power at the interim look is at least 50% it is not necessary to use the CHW statistic at the final analysis. The conventional Wald statistic may then be used without inflating the type-1 error. We shall study the operating characteristics of the above CDL design through simulation. The option for choosing CDL method for adaptation 1162 55.1 The CDL Method – 55.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is on the Sample Size Re-estimation tab. We call the CDL simulations by clicking on the radio-button for CDL, the resulting simulation input window will appear as shown below. The inputs on this window are almost the same as those for CHW simulations which was described in detail in Chapter 54. Most of the entries are self-explanatory. Those that need special explanation are listed below. All the conditional power calculations mentioned below will be performed ˆ obtained at the time of the interim analysis. at the estimated value, δ/σ, Min and Max CP: This range partitions the interim result into unfavorable, promising and favorable zones based on conditional power (CP). If the conditional power at the interim look, under the original sample size, falls in this range then the interim result is deemed to be promising and the sample size is re-estimated according to criteria specified in the remaining cells. Max Sample Size if Adapt, multiplier : Use this cell to specify the cap for the re-estimated sample size. Since, we don’t allow decrease in sample size after adaptation, the minimum sample size is the one coming from the study design. This interval [Min. Sample Size and Max. Sample Size] defines the range of re-estimated sample size after adaptation. 55.1 The CDL Method – 55.1.1 Normal Endpoint 1163 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Use Wald Statistic if CP(n1 ) ≥: The entry in this cell determines when to use the conventional Wald statistic and when to use the CHW statistic for the final analysis. The default entry is 0.5. Thus if the conditional power at the interim analysis is at least 50%, the simulations will use the conventional Wald statistic for the final analysis. Otherwise the CHW statistic will be used. The CDL method will preserve the type-1 error as long as this entry is at least 0.5. We shall show subsequently that by applying the Gao, Ware and Mehta (2008) extension, the probability in this cell can be lowered without inflating the type-1 error. Target Conditional Power for Re-estimating Sample Size: This entry is the primary driver for the new sample size. It specifies what conditional power is desired at the end of the study. The sample size for the remainder of the trial is changed accordingly, subject to the constraints placed upon it by the Max Sample Size if Adapt cell. Suppose, for example, that we wish to run 100,000 simulations at δ = 1.6 and σ = 7.5, and to increase the sample size only if the conditional power at the interim analysis under the original sample size is between 0.5 and 0.9. And in that case suppose that we wish to increase the sample size by just the right amount so that the conditional power is boosted to 0.95. Furthermore suppose that the re-estimated sample size is constrained to remain between 442 and 884 subjects. To run the simulations with these specifications we would change the entries in the Response Generation Info tab, the Sample Size Re-estimation tab and the Simulation Control Info tab as shown below The Response Generation Info tab: 1164 55.1 The CDL Method – 55.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Sample Size Re-estimation tab: The Simulation Control Info tab: We run the simulations by pressing the Simulate button. An entry for CDL simulation gets added in the Output Preview pane. Save this in the Library and 55.1 The CDL Method – 55.1.1 Normal Endpoint 1165 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method observe the detailed output. The null hypothesis was rejected 66,463 times in 100,000 trials for an overall power of 66.46%. The average sample size was 530.26. In contrast, if there is no sample size increase, the power would be 61% and the average sample size would be 442. This can be verified by setting the multiplier for Max. Sample Size if Adapt to 1 on Sample Size Re-estimation tab. This is not the full story, however. As discussed in Chapter 54, one of the major appeals of an adaptive design is the ability to invest in stages, with the additional sample size investment being required only if the interim result falls in the promising zone. From this point of view it is of interest to examine the power and expected sample size conditional on being in the unfavorable, promising and favorable zones. The top part of the simulation output shows that the trial falls into the promising zone, and thereby undergoes an adaptive sample size increase, in 25,030 of the of 100,000 simulations (25.03%). Moreover 90% of these simulated trials go on to reject the null hypothesis. This is a significant boost to the power of the study, conditional on having a favorable interim outcome. The simulation results are 1166 55.1 The CDL Method – 55.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 displayed zone by zone as shown below. The expected sample size of all the trials that undergo a sample size increase is 794.799. Although this is considerably greater than the overall average of 530.263, it is important to recognize that a sample size increase is only requested if a trial enters the promising zone at the interim look. In that case, the prospects of success become extremely promising (90 % power) and hence, the sponsor or investor might be willing to make the additional investment. The alternative approach, to commit a large sample size at the very beginning, before any interim results have been observed, might not be as attractive. In the above figure, observe that trials fall into the favorable zone (conditional power at least 90%) 32.688% of the time. For such trials the success rate is 90.256%, and no sample size increase is called for. Trials fall into the unfavorable zone 42.264% of the time and only 34.069% of such trials go on to succeed. In this design, the adaptive option is invoked only 25.03% of the time, but once invoked, it greatly improves the chances of success. This example has highlighted the importance of evaluating any proposed adaptive strategy by simulation before adopting it. One should look at the operating characteristics of the proposed adaptive design over the entire range of plausible parameter values in order to determine if the rules for sample size increase are acceptable. If the operating characteristics are not satisfactory, it would be necessary to perform similar simulation experiments with a different adaptive strategy for sample size change. In this manner it is possible to converge to an acceptable design. It is interesting to simulate the trial under the null hypothesis and verify that the type-1 error is indeed preserved. Accordingly set the Mean Treatment µt cell in 55.1 The CDL Method – 55.1.1 Normal Endpoint 1167 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Response Generation Info tab to 0. The other simulation parameters are unchanged. The results based on 100,000 simulated trials are displayed below. It is seen that only 2426 of the 100,000 trials rejected the null hypothesis, for an overall type-1 error of 2.426%. The type-1 error was thus preserved. Suppose, in order to provide the maximum opportunity to increase the sample size we set the Promising Zone: Min.CP to 0, in addition to setting Mean 1168 55.1 The CDL Method – 55.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Treatment µt to 0. Let us keep the other parameters unchanged. The results based on 100,000 simulated trials are displayed below. It is seen that only 2363 of the 100,000 trials rejected the null hypothesis, for an overall type-1 error of 2.363%. The type-1 error was thus preserved. Now suppose we disable the CDL constraint by changing the entry in the Use Wald Stat. if 55.1 The CDL Method – 55.1.1 Normal Endpoint 1169 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method CP(442) >= cell from 0.5 to 0.0 as shown below and run simulations. This time the type-1 error is not preserved. Of 100,000 simulated trials a total of 2590 rejected the null hypothesis, for a type-1 error of 2.59%. This shows that the CDL constraint is indeed necessary. 1170 55.1 The CDL Method – 55.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 55.1.2 Binomial Endpoint: Acute Coronary Syndromes Trial Consider a two-arm, placebo controlled randomized clinical trial for subjects with acute cardiovascular disease undergoing percutaneous coronary intervention (PCI), which we discussed in Section 54.4. The primary endpoint in this study is a composite of death, myocardial infarction or ischemia-driven revascularization during the first 48 hours after randomization. We assume on the basis of prior knowledge that the event rate for the placebo arm is 8.7%. The investigational drug is expected to reduce the event rate by at least 20%. The investigators are planning to randomize a total of 8000 subjects in equal proportions to the two arms of the study. As explained in the beginning of this chapter, for applying CDL method, a 2 look group sequential design will suffice, without loss of generality. It is easy to show that a group sequential design enrolling a total of 8000 subjects with an interim look after 4000 subjects are enrolled (50% of total information), will have 82% power to detect a 20% risk reduction with a one-sided level-0.025 test of significance, and early stopping efficacy boundary derived from the Lan and DeMets (1983) O’Brien-Fleming type error spending function. 55.1 The CDL Method – 55.1.2 Binomial Endpoint 1171 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method The actual risk reduction is expected to be larger, but could also be as low as 15%, a treatment effect that would still be of clinical interest given the severity and importance of the outcomes. In addition, there is some uncertainty about the magnitude of the placebo event rate. For these reasons the investigators wish to build into the trial design some flexibility for adjusting the sample size. Two options under consideration are, a group sequential design with the possibility of early stopping in case the risk reduction is large, and an adaptive design with the possibility of increasing the sample size in case the risk reduction is small. In the remainder of this section we shall discuss these two options and show how they may be combined into a single design that captures the benefits of both. For this design, where the risk reduction is 20%; the probabilities of crossing boundary at Look1 (N=4000) is 0.181, and at Final Look 0.644; the overall power is 82%. As we did in chapter 54, we partition the sample space into three important zones, unfavorable, promising and favorable, based on the conditional power attained at the interim look. The sample size may then be increased if the interim results fall inside the promising zone, thereby recovering the lost power. The additional feature of the CDL design is, however, that if conditional power at the interim look is at least 50% it is not necessary to use the CHW statistic at the final analysis. The conventional Wald statistic may then be used without inflating the type-1 error. Adaptive Group Sequential Design We convert the two-look group sequential design Des 1 into an adaptive group sequential design to increase the sample size at look 1, when 4000 subjects have been enrolled. The rules governing the sample size increase similar to the rules specified in Section 55.1.1 for the schizophrenia trial. We shall study the operating characteristics of the above CDL design through simulation. The option for choosing CDL method for adaptation is on the Sample Size Re-estimation tab. We invoke the CDL simulation by clicking on the radio-button for 1172 55.1 The CDL Method – 55.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 CDL, The resulting simulation input window will appear as shown below. The inputs on this window are almost the same as those for CHW simulations which was described in detail in Chapter 54. Most of the entries are self-explanatory. Those that need special explanation are similar to what have been described in section 55.1.1 for schizophrenia example. All the conditional power calculations mentioned below will be performed at the estimated value, πc , πt , obtained at the time of the interim analysis. Suppose, for example, that we wish to run 100,000 simulations at risk reduction ρ = 0.15 and to increase the sample size only if the conditional power at the interim analysis under the original sample size is between 0.5 and 0.9. And in that case suppose that we wish to increase the sample size by just the right amount so that the conditional power is boosted to 0.95. Furthermore suppose that the re-estimated sample size is constrained to remain between 8000 and 16000 subjects. To run the simulations with these specifications we would change the entries in the three tabs as shown below. 55.1 The CDL Method – 55.1.2 Binomial Endpoint 1173 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method The Response Generation Info tab: The Sample Size Re-estimation tab: The Simulation Control Info tab: We run the simulations by pressing the Simulate button. An entry for CDL simulation gets added in the Output Preview pane. Save this in the Library and 1174 55.1 The CDL Method – 55.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 observe the detailed output. The null hypothesis was rejected 62,392 times in 100,000 trials for an overall power of 62.4%. The average sample size was 9318.60. In contrast, if there is no sample size increase, the power would be 57.2% and the average sample size would be 7703.1. Next, let us consider these results zone by zone. As discussed in Chapter 54, one of the major appeals of an adaptive design is the ability to invest in stages, with the additional sample size investment being required only if the interim result falls in the promising zone. From this point of view it is of interest to examine the power and expected sample size conditional on being in the unfavorable, promising and favorable zones. The bottom part of the simulation output shows that the trial falls into the promising zone, and thereby undergoes an adaptive sample size increase, in 24,944 of the of 100,000 simulations (24.94%). Moreover 87.98% of these simulated trials go on to reject the null hypothesis. This is a significant boost to the power of the study, 55.1 The CDL Method – 55.1.2 Binomial Endpoint 1175 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method conditional on having a favorable interim outcome. The expected sample size of all the trials that undergo a sample size increase is 14,441.961. Although this is considerably greater than the overall average of 9,318.603, it is important to recognize that a sample size increase is only requested if a trial enters the promising zone at the interim look. In that case, the prospects of success become extremely promising (87.98% power) and hence, the sponsor or investor might be willing to make the additional investment. The alternative approach, to commit a large sample size at the very beginning, before any interim results have been observed, might not be as attractive. The simulation results are also displayed zone by zone as shown below. Observe that trials fall into the Favorable + Efficacy zone (conditional power at least 90%) 29.97% of the time. For such trials the success rate is 82.19%, and no sample size increase is called for. Trials fall into the unfavorable zone 45.08% of the time and only 35.08% of such trials go on to succeed. In this design the adaptive option is invoked 24.94% of the time, and once invoked, it greatly improves the chances of success. This example has highlighted the importance of evaluating any proposed adaptive strategy by simulation before adopting it. One should look at the operating characteristics of the proposed adaptive design over the entire range of plausible parameter values in order to determine if the rules for sample size increase are acceptable. If the operating characteristics are not satisfactory it would be necessary to perform similar simulation experiments with a different adaptive strategy for sample size change. In this manner it is possible to converge to an acceptable design. It is interesting to simulate the trial under the null hypothesis and verify that the type-1 error is indeed preserved. Accordingly set the Proportion Under Treatment cell in Response Generation Info tab to Proportion Under Control. The 1176 55.1 The CDL Method – 55.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 other simulation parameters are unchanged. The results based on 100,000 simulated trials are displayed below. It is seen that only 2441 of the 100,000 trials rejected the null hypothesis, for an overall type-1 error of 2.44%. The type-1 error was thus preserved. Suppose, in order to provide the maximum opportunity to increase the sample size we set the Promising Zone: Min.CP to 0, in addition to setting the response rate same for control and treatment. Let us keep the other simulation parameters are 55.1 The CDL Method – 55.1.2 Binomial Endpoint 1177 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method unchanged. 1178 55.1 The CDL Method – 55.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The results based on 100,000 simulated trials are displayed below. It is seen that only 2233 of the 100,000 trials rejected the null hypothesis, for an overall type-1 error of 2.23%. The type-1 error was thus preserved. Now suppose we disable the CDL constraint by changing the entry in the Use Wald 55.1 The CDL Method – 55.1.2 Binomial Endpoint 1179 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Stat. if CP(442) >= cell from 0.5 to 0.0 as shown below. This time the type-1 error is not preserved. Of 100,000 simulated trials a total of 2616 rejected the null hypothesis, for a type-1 error of 2.62% which is slightly inflated. This shows that the CDL constraint is indeed necessary. 55.1.3 Survival Endpoint: Lung Cancer Trial Let us re-visit the non-small cell lung cancer trial introduced in Section 54.5 of Chapter 54. This is a two-arm multi-center randomized clinical trial for subjects with advanced metastatic non-small cell lung cancer comparing the current standard second line therapy (docetaxel+cisplatin) to a new docetaxel containing combination regimen. The primary endpoint is overall survival (OS). The study is required to have one-sided α = 0.025, and 90% power to detect an improvement in median survival, from 8 1180 55.1 The CDL Method – 55.1.3 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 months on the control arm to 11.4 months on the experimental arm, which corresponds to a hazard ratio of 0.7. We shall first create a group sequential design for this study in East, and shall then show how the design may be improved by permitting an increase in the number of events and sample size at the time of the interim analysis. Following the steps exactly as outlined in Section 54.5.1 of Chapter 54 we create a 2-look group sequential design with an efficacy boundary derived from the Lan and DeMets (1983) O’Brien-Fleming type spending function, a futility boundary derived from the γ-spending function of Hwang, Shih and DeCani (1990) with parameter γ = −5, and an interim analysis at 50% of the total information. It is planned to enroll subjects over 24 months and extend the follow-up for six additional months, thereby completing the study in 30 months. This design is created in East and displayed below 55.1 The CDL Method – 55.1.3 Survival Endpoint 1181 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method as Des1. Des1 requires an up-front commitment of 334 events to achieve 90% power. With an enrollment of 483 subjects over 24 months, the required 334 events are expected to arrive within 30 months. An interim analysis will be performed after 167 events are obtained (50% of the total information). Under the alternative hypothesis that the 1182 55.1 The CDL Method – 55.1.3 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hazard ratio is 0.7, the chance of crossing the efficacy boundary at the interim look is about 26% leading to an expected sample size of 454 subjects and an expected study duration of about 27 months. Although Des1 is adequately powered to detect a hazard ratio of 0.7, its power deteriorates from 90% to below 68% if the true hazard ratio is 0.77, an effect that is still considered clinically meaningful. To see this let us simulate Des1 under HR=0.77. Select Des1 in the Library and click the icon. You will be taken to the usual simulation input window . This has four tabs as below: The use of four tabs Simulation Parameters, Response Generation Info, Accrual/Dropout Info and Simulation Control Info is exactly same as that explained 55.1 The CDL Method – 55.1.3 Survival Endpoint 1183 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method in sections of CHW Simulations. You may refer to Chapter 54, Section 54.5.3 for a complete description of their functioning. The fourth tab, Sample Size Re-estimation, is almost identical to the corresponding tab for CHW simulations but contains one additional input parameter that distinguishes the CDL method from the CHW method for adaptive design. We will assume for the remainder of this section that the user is familiar with the CHW simulation worksheet. If not, please refer to Section 54.5.3 of Chapter 54 where this worksheet was fully discussed with a worked example. Observe that the Response Generation Info tab currently displays a hazard ratio of 0.7, since this was the value specified at the design stage. We know from the design of Des1 that a hazard ratio of 0.7 will yield 90% power. But what if the true hazard ratio was 0.77? The resultant deterioration in power can be evaluated by simulation. Accordingly we shall alter the Treatment cell, containing the hazard 0.0607, by replacing it with 0.77 ∗ 0.0866 = 0.0667. To run 10,000 simulations with a hazard ratio of 0.77, click on the Simulate button. The following simulation output is displayed. 1184 55.1 The CDL Method – 55.1.3 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The overall power is only 66.31% suggesting that it might be useful to consider an adaptive increase in the number of events and sample size at the interim look. The impact of an adaptive increase in the number of events and sample size on power and study duration can be evaluated by simulation. Accordingly click on the Sample Size Re-estimation tab and select the option of CDL on this tab. This will take you to 55.1 The CDL Method – 55.1.3 Survival Endpoint 1185 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method the input parameters for performing the adaptive simulations using CDL method. These inputs and output quantities (tables and charts) were fully described in Section 54.5.3 of Chapter 54. Thus, they will not be discussed again here with the exception of a single additional input parameter that appears on the tab when the CDL method is selected. This input is not a part of input parameters for the CHW simulations. This new parameter appears between the Target CP for Re-estimating Events field and the Promising Zone Scale field. 1186 55.1 The CDL Method – 55.1.3 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Suppose the following values have been entered into the Input Parameters Table. These inputs imply that there will be a 50% increase in the number of events for each simulation that enters the promising zone and up to a 50% increase in the sample size also. The promising zone is specified by conditional power (based on estimated HR) being between 0.3 and 0.9. These adaptation rules are the same as the adaptation rules applied in Section 54.5.3 of Chapter 54. However, the test statistic to be used for the final analysis will depend on the CP observed at the interim look. If this CP exceeds 0.5, the conventional Wald statistic (equation (54.3) in Chapter 54) will be used for the final analysis, whereas if this CP is below 0.5, the weighted CHW statistic (equation (54.2) in Chapter 54) will be used for the final analysis. Upon pressing the Simulate button the following outputs are obtained in the Table of Simulation Results by Zone 55.1 The CDL Method – 55.1.3 Survival Endpoint 1187 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method and in the Table of Output for all Trials. These results are almost the same as were obtained by use of the CHW method. The main advantage of using the CDL method is that one can dispense with the use of the non-standard, weighted CHW statistic (54.2) as long as the conditional power at the interim analysis exceeds 0.5. Therefore, if the minimum CP for the promising zone is itself 0.5, one can dispense with the use of the CHW statistic altogether and always use the conventional Wald statistic at the time of the final analysis. To see that the CDL condition (CP ≥ 0.5) is necessary for preserving the type-1 error if the Wald statistic is always used for increasing the number of events, consider the following simulation experiment based on 10,000 simulated trials. Set the hazard ratio to 1, so as to simulate under the null hypothesis. 1188 55.1 The CDL Method – 55.1.3 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now choose the following values for the Adaptation. Since the Promising Zone: Min CP has been assigned the value 0, the promising zone starts at CP(334) = 0 and there is no unfavorable zone. Since the Target CP equals 0.9, the number of events will be increased in each simulation from 334 to the amount needed to hit a target conditional power of 0.9, subject to a Max. Events if Adapt cap of 3340 events. However, since the Use Wald Stat. if CP>= has been set to 0, each simulation that falls in the promising zone will use the conventional Wald statistic and not the CHW statistic, despite the data dependent increase in number of events. Upon pressing the Simulate button the following results are displayed in the table of Simulation Results by Zone. 55.1 The CDL Method – 55.1.3 Survival Endpoint 1189 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Row 6 of the Table of Simulation Results by Zone displays the results for all trials, combined across zones. Thus Column 3 of Row 6 of this table displays the magnitude of the type-1 error, 0.0317 which is seen to exceed 0.025 even after accounting for Monte Carlo error. To be sure, the entries in the Sample Size Re-estimation tab are rather extreme and unrealistic. However this example serves to illustrate the point that control of type-1 error cannot be guaranteed if the Wald statistic replaces the CHW statistic inappropriately. On the other hand suppose we set the Use Wald Stat. if CP >= to 0.5. With these inputs, the CHW statistic will be used for the final analysis if, at the interim analysis, 0 < CP(334) < 0.5 and the conventional Wald statistic will be use if, at the interim analysis, 0.5 ≤ CP(334) < 0.9. Upon pressing the Simulate button, the 1190 55.1 The CDL Method – 55.1.3 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following results are displayed in the table of Simulation Results by Zone. This time 249 of the 10,000 simulations rejected the null hypothesis, for an overall type-1 error of 0.0249. Thus the type-1 error is controlled. 55.2 Extension of CDL Method 55.2.1 55.2.2 55.2.3 55.2.4 Underlying Theory Normal Endpoint Binomial Endpoint Survival Endpoint We now describe an extension to the CDL method in which the 0.5 probability limit above which one is permitted to substitute the conventional Wald statistic for the CHW statistic can be lowered. The amount by which the CDL criterion can be lowered will depend on the other design parameters of the trial, and must be computed separately for each specific trial design. We have provided a table of cut-off values from which one may extrapolate for this purpose. The underlying theory is discussed next and provides some insight into why both, the CDL method and the extended CDL method are able to protect the type-1 error. 55.2.1 Underlying Theory The results in this section are only valid for one-sided tests, and only for a sample size increase, but not for a sample size decrease. For simplicity we confine the discussion to tests of H0 : δ = 0 against the one-sided alternative δ > 0. However these results apply equally to tests against the one-sided alternative δ < 0. The ability to relax the criterion for using the conventional Wald statistic in an adaptive trial is based on a result due to Gao, Ware and Mehta (2008). Using the notation introduced in Chapter 54, let (n1 , n2 ) be the pre-specified cumulative sample sizes for look 1 and look 2, respectively, and let (b1 , b2 ) be corresponding one-sided level-α boundaries. Let p Z1 = δ̂1 I1 be the observed value of the Wald statistic at look 1, where I1 is the Fisher information about δ based on the n1 observations available at the time of the interim analysis. After observing Z1 = z1 suppose that the cumulative sample size for the final analysis is increased from n2 to n∗2 . Using the notation developed in Chapter 54, we define the incremental Wald statistic p Z ∗(2) = δ̂ ∗(2) I ∗(2) , 55.2 Extension of CDL Method – 55.2.1 Underlying Theory 1191 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method where I ∗(2) is the Fisher information about δ based only on the additional n∗2 − n1 observations obtained after the interim analysis. The CHW statistic (54.2) can be expressed as r r n1 n1 ∗(2) ∗ Z2, = Z + (1 − )Z 1 chw n2 n2 while the conventional Wald statistic (54.4) can be expressed as r r n1 n1 ∗ Z1 + (1 − ∗ )Z ∗(2) . Z2,wald = n∗2 n2 Since the CHW statistic preserves the type-1 error it is clear that (2) P0 (Z1 ≥ b1 ) + P0 (Z1 < b1 , Zchw ≥ b2 ) = α . (55.1) However, due to the data dependent sample size change at look 1, ∗ P0 (Z1 ≥ b1 ) + P0 (Z1 < b1 , Z2, wald ≥ b2 ) 6= α . Therefore using the conventional Wald statistic for the final analysis will not protect the type-1 error. Gao, Ware and Mehta (2008) have shown that if, upon observing Z1 = z1 and increasing the total sample size from n2 to n∗2 we change the final critical boundary from b2 to # "r √ √ n∗2 − n1 √ ∗ ∗ −0.5 (b2 n2 − z1 n1 ) + z1 n1 (55.2) b2 (z1 , n2 ) = (n2 ) n2 − n1 then ∗ ∗ P0 (Z1 ≥ b1 ) + P0 (Z1 < b1 , Z2, wald ≥ b2 (z1 , n2 )) = α . (55.3) Thus we can use the conventional Wald statistic for the final analysis and also protect the type-1 error provided we replace the final critical boundary value b2 by b2 (z1 , n∗2 ). The extended CDL method follows from this result. The Extended CDL Method: Whenever b2 (z1 , n∗2 ) ≤ b2 , we may reject the null hypothesis H0 : δ = 0 in favor of the one sided alternative that δ > 0 if ∗ (Z1 ≥ b1 ) or (Z1 < b1 , Z2, wald ≥ b2 ) (55.4) and the type-1 error will not exceed α notwithstanding the data dependent sample size increase from n2 to n∗2 at the interim analysis. This result holds because b2 (z1 , n∗2 ) ≤ b2 implies that α = ≥ 1192 ∗ ∗ P0 (Z1 ≥ b1 ) + P0 (Z1 < b1 , Z2, wald ≥ b2 (z1 , n2 )) P0 (Z1 ≥ b1 ) + P0 (Z1 < ∗ b1 , Z2, wald ≥ b2 ) 55.2 Extension of CDL Method – 55.2.1 Underlying Theory (55.5) (55.6) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Recall that the regular CDL method satisfies (55.6) only if the conditional power at the interim look is at least 0.5. We shall see that the extended CDL method satisfies (55.6) over a wider range of conditional powers. To show this we must investigate the behavior of the adjusted boundary b2 (z1 , n∗2 ) as a function of z1 and n∗2 . We first reduce the dimensionality of the investigation by making the increased sample size n∗2 a function of z1 . This is achieved by imposing the requirement that the new sample size, n∗2 , should be such that the conditional power given z1 , evaluated at δ̂1 , reaches some pre-specified target value, subject however to an upper limit on the magnitude of the sample size increase. To be specific, define ∗ CPδ̂1 (z1 , n∗2 ) = Pδ̂1 (Z2, chw ≥ b2 |z1 ) . Under the extended CDL method, we pre-specify a target value for CPδ̂1 (z1 , n∗2 ), say 1 − β, and attempt to reach it by altering the sample size from n2 to n∗2 . The first step is to find the sample size n02 (z1 ) for each possible value of z1 such that CPδ̂1 (z1 , n02 (z1 )) = 1 − β . (55.7) A simplification of Gao, Ware and Mehta(2008, equation (5)) shows that (55.7) is satisfied by the function n02 (z1 ) = [ √ 2 √ n1 b2 n2 − z1 n1 √ ] + z + n1 . β z12 n2 − n1 (55.8) There are, however, restrictions on the range of sample size alterations that are allowable at the interim analysis. At the lower end, the CDL and extended CDL methods do not permit the sample size to be decreased below the original sample size n2 . At the upper end there is usually a limit to the magnitude of the sample size ∗ increase that the sponsor will permit. Denote this upper limit by Nmax . Then the new sample size at the time of the interim analysis is computed by the formula ∗ n∗2 = max{n2 , min(n02 (z1 ), Nmax )} . (55.9) Note that n∗2 (z1 ) is a random variable at the start of the trial, its value being determined by the statistic z1 obtained at the interim analysis. By substituting (55.9) into (55.2) we can express the adjusted critical value b2 (z1 , n∗2 ) for the final analysis as a function of z1 alone, and will hereafter denote it as b2 (z1 , n∗2 (z1 )) to show the explicit dependence of n∗2 on z1 . Thus, we may use the criterion (55.4) for rejecting H0 without inflating the type-1 error for the entire range of z1 values that satisfy b2 (z1 , n∗2 (z1 )) ≤ b2 , thereby utilizing the conventional group sequential hypothesis test at the final analysis despite a data dependent sample size 55.2 Extension of CDL Method – 55.2.1 Underlying Theory 1193 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method increase at the interim analysis. To obtain this range, it is convenient to plot b2 (z1 , n∗2 (z1 )) and b2 against z1 . Figure 55.1 displays such a plot for the two-look Schizophrenia trial that was discussed in Section 55.1.1. For this trial we have ∗ n1 = 208, n2 = 442, Nmax = 884, b1 = 5.25, b2 = 1.96, β = 0.2 and the sample size will be increased at look 1 from n2 to n∗2 (z1 ) based on equation (55.9). Figure 55.1: Adjusted Critical Value b2 (z1 , n∗2 (z1 )) and Critical Value, b2 versus z1 The curves of b2 (z1 , n∗2 (z1 )) and b2 intersect at two places; at z1,min = 1.1657 and z1,max = 1.7646. Thus for all 1.0982 ≤ z1 ≤ 1.7646, we may use the conventional Wald test ∗ Z2, (55.10) wald ≥ b2 at the final analysis without inflating the type-1 error. To be sure we might lose some power because we are using (55.10) as our rejection criterion instead of using the less 1194 55.2 Extension of CDL Method – 55.2.1 Underlying Theory <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 restrictive rejection criterion ∗ ∗ Z2, wald ≥ b2 (z1 , n2 (z1 )) (55.11) which also protects the type-1 error since, by (55.1) and (55.3), ∗ ∗ P0 (Z1 ≥ b1 ) + P0 (Z1 < b1 , Z2, wald ≥ b2 (z1 , n2 )) = ∗ P0 (Z1 ≥ b1 ) + P0 (Z1 < b1 , Z2, chw ≥ b2 ) = α. (55.12) However, that is the price we must pay for using the conventional Wald test with guaranteed preservation of the type-1 error, instead of using the CHW test. In the next section we will show that the power loss is in fact negligible. It is convenient re-scale the X-axis of Figure 55.1 in terms of conditional power. We can show that the conditional power given z1 , evaluated at the estimated value δ̂1 , under the assumption that the final sample size remains unchanged at n2 , is √ √ √ b2 n2 − z1 n1 z1 n2 − n1 √ − . (55.13) CPδ̂1 (z1 , n2 ) = 1 − Φ √ n1 n2 − n1 Accordingly we use equation (55.13) to transform the X-axis from z1 to CPδ̂1 (z1 , n2 ). Figure 55.2 is a plot of b2 (z1 , n∗2 (z1 )) and b2 against CPδ̂1 (z1 , n2 ). The curves intersect at two points which we denote as CPmin and CPmax . For the current example, CPmin = 0.36 and CPmax = 0.8. Thus for all 0.36 ≤ CPδ̂1 (z1 , n2 ) ≤ 0.8, we may use the conventional Wald test (55.10) at the final analysis without inflating the type-1 error. The conventional Wald statistic may be used without inflating the type-1 error as long as CPδ̂1 (z1 , n2 ) ≥ CPmin , and the sample size is only permitted to increase (but never decreased) in accordance with (55.9). The extended CDL simulation module in the EastAdapt software accepts CPmin as an input. The hypothesis test at the time of the final analysis of each simulated trial ∗ utilizes the conventional Wald criterion Z2, wald ≥ b2 for rejecting H0 if ∗ CPδ̂1 (z1 , n2 ) ≥ CPmin and utilizes the CHW criterion Z2, chw ≥ b2 otherwise. Thus in all cases the type-1 error is preserved. The following is a summary of the extended CDL method: 1. Pre-specify the conditional power 1 − β that will be targeted at the time of the interim analysis 2. For a wide range of z1 values, compute the new sample size n∗2 (z1 ) that would be needed to achieve the targeted conditional power, using equation (55.9) 3. Substitute n∗2 (z1 ) into equation (55.2) to obtain b2 (z1 , n∗2 (z1 )) 55.2 Extension of CDL Method – 55.2.1 Underlying Theory 1195 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Figure 55.2: Plots of Critical Values b2 (z1 , n∗2 (z1 )) and b2 versus CPδ̂1 (z1 , n2 ) 4. Transform each z1 into a corresponding conditional power CPδ̂1 (z1 , n2 ) using equation (55.13) 5. Plot b2 (z1 , n∗2 (z1 )) and b2 versus CPδ̂1 (z1 , n2 ) and determine the value CPmin where b2 (z1 , n∗2 (z1 )) first intersects with b2 as shown in Figure 55.2. 6. Under the extended CDL method we can use the conventional Wald criterion ∗ Z2, wald ≥ b2 to reject H0 at the final analysis whenever CPδ̂1 (z1 , n2 ) ≥ CPmin . For the convenience of the user we have pre-computed CPmin cut-offs for some common two-stage, adaptive designs with no early stopping, and have displayed them in Table 55.1. All table entries are expressed as multiples of the initially proposed sample size n2 and do not depend on the actual value of n2 specified in the design. 1196 55.2 Extension of CDL Method – 55.2.1 Underlying Theory <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One may conveniently refer to this table for suitable cut-offs instead of calculating them through the six-step procedure outlined above. For values of (n1 /n2 ) or Table 55.1: CPmin Cut-Off Values for Some Typical Two-Stage Adaptive Designs with no Early Stopping either for Efficacy or Futility Sample Size Ratios Maximum Allowed At Interim Look ∗ (Nmax /n2 ) (n1 /n2 ) CPmin Values for Targeted Conditional Powers 80% 90% 95% 1.5 1.5 1.5 2 2 2 3 3 3 0.25 0.5 0.75 0.25 0.5 0.75 0.25 0.5 0.75 0.42 0.41 0.38 0.37 0.36 0.33 0.32 0.31 0.30 0.42 0.41 0.38 0.37 0.36 0.33 0.32 0.31 0.27 0.42 0.41 0.38 0.37 0.36 0.33 0.32 0.30 0.27 ∞ ∞ ∞ 0.25 0.5 0.75 0.32 0.31 0.30 0.28 0.27 0.25 0.26 0.25 0.23 ∗ (Nmax /n2 ) not included in the table, one may utilize the closest available cut-off value that guarantees conservative preservation of type-1 error. For example, for the ∗ Schizophrenia trial, (Nmax /n2 ) = 2 and (n1 /n2 ) = 0.47. Table 55.1 shows that the cut-off value CPmin = 0.37 will preserve the type-1 error conservatively for targeted conditional powers of 80%, 90% or 95%. Observe that CDmin < 0.5 for all the entries in Table 55.1 thus demonstrating that the extended CDL method is a relaxation of the original CDL method. 55.2.2 Normal Endpoint: Schizophrenia Trial Consider again the Schizophrenia example introduced in Section 55.1.1. This is a two-look design with an initially specified sample size n2 = 442 and one interim look after seeing data on n1 = 208 completers. The stopping boundaries at the interim and final look are one-sided level-0.025 efficacy boundaries derived from the γ(−24) spending function, which for all practical purposes implies that there will be no early stopping for efficacy. There is no futility boundary. This trial has slightly over 80% power to detect δ = 2, given a standard deviation of σ = 7.5. The East design 55.2 Extension of CDL Method – 55.2.2 Normal Endpoint 1197 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method screenshot is reproduced below. Suppose that at the time of the interim analysis the sample size may be increased up to ∗ a maximum of Nmax = 884 in an attempt to attain a target conditional power of 95%. Assume that the sample size will only be increased (never decreased) if at the interim analysis the observed z1 is such that 0.5 ≤ CPδ̂1 (z1 , 442) < 0.8, identified as the promising zone for the interim results. We will simulate the trial under different assumptions about δ and σ, using the extended CDL criterion instead of the original CDL criterion. To do this we need to know CPmin , the value of CPδ̂1 (z1 , n2 ) at which the adjusted critical value b2 (z1 , n∗2 (z1 )) starts to dip below the critical value b2 = 1.96. To obtain the exact cut-off value we would have to manually execute the six-step procedure outlined at the end of Section 55.2.1. An easier alternative is to use the cut-off values provided in Table 55.1 for the standard two-stage designs. The difference in the operating characteristics of the design produced the two methods is negligible. Here ∗ (Nmax /n2 ) = 2, (n1 /n2 ) = 0.47 and the targeted conditional power is 0.95. Since there is no entry in Table 55.1 for this choice of parameters we use the more ∗ conservative choices, (Nmax /n2 ) = 2 and (n1 /n2 ) = 0.25, whereupon CPmin = 0.37 for a targeted conditional power of 95%. Suppose we wish to obtain the power of the above adaptive design under δ = 1.6 and σ = 7.5. We therefore save the design in Library, insert Simulations for this design and enter the following parameters into different tabs: The Response Generation Info 1198 55.2 Extension of CDL Method – 55.2.2 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tab: The Sample Size Re-estimation tab: and the Simulation Control Info tab: We will now run 100,000 simulations at δ = 1.6 and σ = 7.5. Upon clicking the 55.2 Extension of CDL Method – 55.2.2 Normal Endpoint 1199 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Simulate button, the simulations are activated. The results are shown below. The null hypothesis was rejected a total of 68619 times in 100,000 trials for an overall power of 68.6%. The average sample size was 557.0. The top part of the simulation output, shows the zone by zone results and the results conditional on falling in the promising zone and thereby undergoing a sample size increase. This occurred 30,890 times out of 100,000 simulations. Moreover 27,805 of these trials rejected the null hypothesis for a power of 90.013%. The expected sample size of all trials that underwent a sample size increase was 814.413. As before, it is of interest to verify that the type-1 error is preserved by the extended CDL method. Accordingly we set δ = 0 in the Mean Treatment µt . Rest all 1200 55.2 Extension of CDL Method – 55.2.2 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 parameters are unchanged. The results from 100,000 simulations are shown below. It is seen that only 2356 of the 100,000 simulations were able to reject the null hypothesis, for a type-1 error of 0.02356. Now in addition to setting δ = 0 in the Difference of Means, we set the Promising Zone: Min CP to zero as well, so as to provide the simulations with the largest possible opportunity to increase the sample size and thereby inflate the 55.2 Extension of CDL Method – 55.2.2 Normal Endpoint 1201 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method type-1 error. The results from 100,000 simulations are shown below. It is seen that only 2267 of the 100,000 simulations were able to reject the null hypothesis, for a type-1 error of 0.02267, preserving type-1 error of 0.025. 1202 55.2 Extension of CDL Method – 55.2.3 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 55.2.3 Binomial Endpoint: Acute Coronary Syndromes Trial Consider again a two-arm, placebo controlled randomized clinical trial for subjects with acute cardiovascular disease undergoing percutaneous coronary intervention (PCI), which we discussed in Section 54.4. The primary endpoint in this study is a composite of death, myocardial infarction or ischemia-driven revascularization during the first 48 hours after randomization. We assume on the basis of prior knowledge that the event rate for the placebo arm is 8.7%. The investigational drug is expected to reduce the event rate by at least 20%. The investigators are planning to randomize a total of 8000 subjects in equal proportions to the two arms of the study. As explained in the beginning of this chapter, for applying CDL method, a 2 look group sequential design will suffice, without loss of generality. It is easy to show that a group sequential design enrolling a total of 8000 subjects with an interim look after 4000 subjects are enrolled (50% of total information), will have 82% power to detect a 20% risk reduction with a one-sided level-0.025 test of significance, and early stopping efficacy boundary derived from the Lan and DeMets (1983) O’Brien-Fleming type error spending function. Suppose that at the time of the interim analysis the sample size may be increased up to ∗ a maximum of Nmax = 16000 in an attempt to attain a target conditional power of 95%. Assume that the sample size will only be increased (never decreased) if at the interim analysis the observed z1 is such that 0.5 ≤ CPδ̂1 (z1 , 8000) < 0.9, identified as the promising zone for the interim results. We will simulate the trial under different assumptions about δ and σ, using the extended CDL criterion instead of the original 55.2 Extension of CDL Method – 55.2.3 Binomial Endpoint 1203 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method CDL criterion. To do this we need to know CPmin , the value of CPδ̂1 (z1 , n2 ) at which the adjusted critical value b2 (z1 , n∗2 (z1 )) starts to hover above the critical value b2 = −1.9686. To obtain the exact cut-off value we would have to manually execute the six-step procedure outlined at the end of Section 55.2.1. An easier alternative is to use the cut-off values provided in Table 55.1 for the standard two-stage designs. The difference in the operating characteristics of the design produced the two methods is ∗ /n2 ) = 2, (n1 /n2 ) = 0.50 and the targeted conditional power negligible. Here (Nmax ∗ /n2 ) = 2 is 0.95. There is an entry in Table 55.1 for this choice of parameters, (Nmax and (n1 /n2 ) = 0.50, whereupon CPmin = 0.36 for a targeted conditional power of 95%. Suppose, we wish to obtain the power of the above adapted design, for example, at risk reduction ρ = 0.15 and to increase the sample size only if the conditional power at the interim analysis under the original sample size is between 0.36 and 0.9. And in that case suppose that we wish to increase the sample size by just the right amount so that the conditional power is boosted to 0.95. Furthermore suppose that the re-estimated sample size is constrained to remain between 8000 and 16000 subjects. To run the simulations with these specifications we would change the entries in simulation tabs as shown below. The Response Generation Info tab: 1204 55.2 Extension of CDL Method – 55.2.3 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Sample Size Re-estimation tab: and the Simulation Control Info tab: We run the simulations by pressing the Simulate button. The results are as shown 55.2 Extension of CDL Method – 55.2.3 Binomial Endpoint 1205 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method below. The null hypothesis was rejected 65,039 times in 100,000 trials for an overall power of 65 %. The average sample size was 9877.5. In contrast, if there is no sample size increase, the power would be 57% and the average sample size would be 8000. The top part of the simulation output displayed below, shows the zone by zone results as well as results conditional on falling in the promising zone and thereby undergoing a sample size increase. This occurred 31,855 times out of 100,000 simulations. Moreover 28,086 of these trials rejected the null hypothesis for a power of 88.2%. The expected sample size of all trials that underwent a sample size increase was 14,796.5. As before, it is of interest to verify that the type-1 error is preserved by the extended 1206 55.2 Extension of CDL Method – 55.2.3 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 CDL method. Accordingly we set treatment proportion same as control proportion =0.087, thereby making δ = 0. Rest all parameters are unchanged. The results from 100,000 simulations are shown below. It is seen that only 2320 of the 100,000 simulations were able to reject the null hypothesis and hence the simulated type-1 error is 0.0232. On the other hand, suppose we perform the very same simulations but set the Use Wald Stat. if CP(8000)>= parameter to zero and Promising Zone: Min CP to zero as well. There is now no protection against type-1 error inflation. As seen below, 2679 of 100,000 simulations with this change rejected the null hypothesis 55.2 Extension of CDL Method – 55.2.3 Binomial Endpoint 1207 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method giving us a type-1 error of 0.02679. 1208 55.2 Extension of CDL Method – 55.2.3 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 55.2.4 Survival Endpoint: Lung Cancer Trial The statistical methodology described in Section 55.2.1 for normal and binomial endpoints applies also to survival endpoints with appropriate changes in notation as described in Chapter 54, Section 54.2. To see this, carry out CDL simulations of Des 1, the lung cancer example discussed earlier in Section 55.1.3 of this chapter. We will simulate this design using CDL method. To do this, insert simulations for this design and add the Sample Size Re-estimation tab. In the Response Generation Info tab, set the hazard ratio to 1 as shown below, so as 55.2 Extension of CDL Method – 55.2.4 Survival Endpoint 1209 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method to simulate under the null hypothesis. Now go to the Sample Size Re-estimation tab and enter the following values: We can verify by simulating 10,000 times with these input parameters that the type-1 error will not be preserved because the input parameter Use Wald Statistic if CP>= has been set to 0 instead of being at 0.5. Consequently the CDL condition, required for preserving the type-1 error if the conventional Wald statistic is being used with a data dependent increase in the number of events, is not satisfied. Hit the Simulate button to obtain the following output: 1210 55.2 Extension of CDL Method – 55.2.4 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Row 6 of the table of Zone-wise Averages displays the results for all trials, combined across zones. Thus Column 4 of Row 6 of this table displays the magnitude of the type-1 error, 0.029 which is seen to exceed 0.025 even after accounting for Monte Carlo error. Table 55.1 shows cut-off values of conditional power below 0.5 at which the use of the conventional Wald statistic will preserve the type-1 error. There is no entry in this table for a Sample Size Ratio (i.e., event multiplier) of 10. However a 10-fold multiplier is for all practical purposes the same as an infinite multiplier. Table 55.1 shows that for a Sample Size Ratio equal to ∞ (infinite multiplier), the cut-off for a trial powered at 90% with and interim analysis at 50% of the information is 0.27. Let us therefore use a 55.2 Extension of CDL Method – 55.2.4 Survival Endpoint 1211 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method cut-off of 0.27 for the simulations instead of a zero cut-off as was done previously. Now run the 10,000 simulations once again. This time it is seen that the type-1 error is preserved. The simulated alpha is 0.0208. 1212 55.2 Extension of CDL Method – 55.2.4 Survival Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Thus the extended CDL method permits a lower cut-off than 0.5 and may be used to design studies with a wider range of promising zones while permitting the use of a conventional Wald statistic for the final analysis without type-1 error inflation. 55.3 Efficiency Considerations At the beginning of this chapter we cited some theoretical results by Tsiatis and Mehta (2003) and Jennison and Turnbull (2006) who demonstrated that the use of the CHW statistic instead of the conventional Wald statistic to perform hypothesis tests in a group sequential clinical trial with sample size changes can lead to loss of efficiency. These results, however, involved extremely large sample size increases (up to tenfold) and numerous interim looks at the accruing data. It would thus be of interest to determine whether the CHW statistic also loses power relative to the conventional Wald statistic for the more common situation of a two-stage clinical trial with at most a doubling of the sample size if the interim results fall in a promising zone. The CHW and CDL simulation worksheets provides us with the tools to make the relevant comparisons. We will accordingly compare the operating characteristics of the two-stage schizophrenia trial when the CHW test, the CDL test and the conventional Wald test are utilized for the final analysis. The design specifications for this trial were provided at the beginning of Section 55.1.1 of this chapter. The trial has a planned enrollment of 442 subjects and an interim analysis after seeing data on 208 completers. the main purpose of the interim analysis is to decide whether to increase the sample size, not to stop early for efficacy. Consequently the conservative γ(−24) error spending function is utilized at the interim analysis. The sample size may be increased up to a maximum of 884 subjects so as to recover a target conditional power of 80%, provided the interim results fall in a promising zone. The promising zone is defined by 0.3 ≤ CP(442) < 0.8 where CP(442) is the conditional power at the interim look (based on the estimated value of δ/σ) assuming no change in the initially specified sample size of 442 subjects. We shall compare power and expected sample size of all three methods (CHW, CDL, conventional Wald) for δ = 0, 1, 1.6, 1.8, 2 assuming σ = 7.5. The CHW method utilizes the CHW simulation worksheet. The following are the simulation parameters for simulating under δ = 0. The Response Generation Info tab: 55.3 Efficiency Considerations 1213 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method The Sample Size Re-estimation tab: and the Simulation Control Info tab: 1214 55.3 Efficiency Considerations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The results for 1,000,000 simulations are shown below. The null hypothesis was rejected 25542 times in 1,000,000 trials, comfortably within the range of Monte Carlo accuracy for a level 0.025 test. Simulation results for other values of δ are displayed in Table 55.2. The CDL method utilizes the following simulation parameters for simulating under 55.3 Efficiency Considerations 1215 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method δ = 0. Notice that the CDL parameters input tab stipulates that the conventional Wald statistic will be used if CP ≥ 0.5. This is the CDL criterion (Chen, DeMets and Lan, 2004) for guaranteeing that the type-1 error will be preserved. The following are the results for 1216 55.3 Efficiency Considerations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1,000,000 simulated trials under δ = 0. The null hypothesis was rejected 24929 times in 1,000,000 trials, comfortably just the range of Monte Carlo accuracy for a level 0.025 test. Simulation results for other values of δ are displayed in Table 55.2. The conventional Wald method also utilizes the CDL simulation worksheet, but it disables the CDL criterion by setting the cell titled 55.3 Efficiency Considerations 1217 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Use Wald Stat. if CP >= to zero as shown below. By setting this CDL parameter to zero we have ensured that the conventional Wald statistic will be used for the final analysis all the time. In principle this should inflate the type-1 error. However, because the sample size is only increased in the promising zone, it is possible that the type-1 error might not be inflated in the current setting. This 1218 55.3 Efficiency Considerations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 turns out to be the case as is shown below for 1,000,000 simulated trials under δ = 0. The null hypothesis was rejected 25406 times in 1,000,000 trials allowing for Monte Carlo accuracy for a level 0.025 test. Simulation results for other values of δ are displayed in Table 55.2. Having established that the CHW, CDL and conventional Wald tests have all preserved the type-1 error, it is now possible to have a meaningful comparison of their respective operating characteristics for other values of δ. These results are displayed in Table 55.2 for δ = 0, 1, 1.6, 1.8 and 2. As noted above, the results for δ = 0 were based on 1,000,000 simulated trials so as to leave no doubt that the type-1 error is preserved. The other results in Table 55.2 are all based on 100,000 simulated trials, which easily produces Monte Carlo accuracy to the nearest percentage point. The operating characteristics of the 442-subject fixed sample non-adaptive trial are also displayed so as to provide a benchmark for the comparisons. Table 55.2 shows that all three adaptive methods preserve the type-1 error and are practically indistinguishable with respect to power or expected sample size for non-zero values of δ. This interesting finding suggests that for practical applications of adaptive sample size re-estimation in two-stage designs there is no loss of efficiency 55.3 Efficiency Considerations 1219 <<< Contents 55 * Index >>> The Chen, DeMets and Lan Method Table 55.2: Operating Characteristics of Fixed Sample and Adaptive (CHW, CDL and Conventional Wald) Adaptive Designs Value of δ 2.0 1.8 1.6 1.0 0.0 Fixed Sample Power N 80.0% 442 71.3% 442 61.1% 442 28.8% 442 2.5% 442 Adaptive-CHW Power E(N ) 84.2% 500 76.5% 505 67.0% 509 33.0% 507 2.5% 472 Adaptive-CDL Power E(N ) 84.2% 503 77% 509 67% 514 33.1% 510 2.45% 473 Adaptive-Wald Power E(N) 84.1% 503 76.6% 510 67.0% 514 33.2% 511 2. 5% 474 due to the use of the CHW statistic, notwithstanding the theoretical results of Tsiatis and Mehta (2003) or Jennison and Turnbull (2006). Further investigation of this conjecture would be desirable. 1220 55.3 Efficiency Considerations <<< Contents * Index >>> 56 Muller and Schafer Method This chapter discusses the Müller and Schäfer (2001) method for adaptive design. This is the most general of the three methods provided by EastAdapt and permits many different types of data dependent changes to a study design in addition to sample size changes. These include data-dependent changes in the error spending function, changes in the number and spacing of the interim looks, and population enrichment via the selection of prospectively identified subgroups. The actual decision rule for making an adaptive change at an interim look can be selected after examining the data available at that look. Indeed the adaptation may be made on the basis of either internal data from the trial, externally available data at the time of the interim look, or a combination of the two. Furthermore, these adaptive changes can be made more than once in any group sequential design. The method is based on preserving the conditional type-1 error in effect at the time of the adaptive change. One can show that if the type-1 error is preserved conditionally for all possible interim results, then it is also preserved unconditionally. P-values, point estimates and confidence intervals adjusted for the adaptive change are produced by extending the work of Müller and Schäfer (2001). We have developed two methods for this extension. Method 1 generalizes the repeated confidence intervals of Jennison and Turnbull (2000, Chapter 9) and was developed by Mehta, Bauer, Posch and Brannath (2007). We refer to it as the RCI method. It is more general than the RCI method discussed in Chapter 54 in that it is valid with any type of adaptive design change whereas the latter is only valid for sample size changes. When only sample size changes are involved, the two RCI methods are the same. Both RCI methods produce confidence intervals with conservative coverage of the unknown δ. Method 2 is BWCI (Backward Image Confidence Interval) method, developed by Gao, Liu and Mehta (2013), provided for computing a two-sided confidence interval having exact coverage, along with a point estimate that is median unbiased for the primary efficacy parameter in a two-arm adaptive group sequential design. The possible adaptations are not only confined to sample size alterations but also include data-dependent changes in the number and spacing of interim looks and changes in the error spending function. The procedure is based on mapping the final test statistic obtained in the modified trial into a corresponding backward image in the original trial. This is an advance on previously available methods, which either produced conservative coverage and no point estimates or provided exact coverage for one-sided intervals only. In Section 56.1 we provide a quick review of the theory underlying the Müller and Schäfer method for preserving the type-1 error and its extension for parameter estimation by the RCI and BWCI methods. For more details, refer to Müller and Schäfer (2001), Mehta, Bauer, Posch and Brannath (2007), and Gao, Liu and Mehta (2013). In Section 56.2, we illustrate the methods through a worked example using the 1221 <<< Contents 56 * Index >>> Muller and Schafer Method EastAdapt software. 56.1 Statistical Method 56.1.1 Hypothesis Testing 56.1.2 Parameter Estimation The original method published by Müller and Schäfer (2001) only provided a solution for the problem of preserving the type-1 error in an adaptive hypothesis test. Subsequently the method was extended by Gao, Liu and Mehta (2013) to cover the related inference problem of computing the point estimate, confidence intervals and p-value. Accordingly in Section 56.1.1 we will discuss hypothesis testing based on the original Müller and Schäfer (2001) method. In Section 56.1.2 we will generalize the approach so as to cover parameter estimation and p-value computation based on the method of Gao, Liu and Mehta (2013). 56.1.1 Hypothesis Testing To understand how the Müller and Schäfer (2001) method works let us consider a one-sided, level-α test of the null hypothesis H0 : δ = 0 versus the one-sided alternative hypothesis H1 : δ > 0 for a two-arm randomized clinical trial. We assume that this is a group sequential trial, designed for K looks at the information fractions t1 , t2 , . . . tK . Let αj , j = 1, 2, . . . K, denote the amount of type-1 error to be spent at the jth look. Let the corresponding stopping boundaries be denoted by {bj : j = 1, 2, . . . K}. Now suppose that at some interim look L the investigators, having already seen the results for the first L looks, wish to alter one or more design parameters for the future course of the study. Such data-dependent alterations might include a change in the maximum sample size, a change in the rate of error spending for the remainder of the trial, a change in the number and spacing of the future interim looks, and even a refinement of the eligibility criteria for enrolling additional patients into the trial. Müller and Schäfer have shown that all such changes are permissible provided the remainder of the trial preserves the conditional rejection probability (CRP), or conditional probability of rejecting H0 , that are in effect at look L. This needs further explanation. Let Zj be the Wald statistic at any look j and suppose that zL is its observed value at look L. Then the CRP, denoted by 0 , is the conditional probability given zL that, under the null hypothesis H0 , Zj will cross the stopping boundary at some future look. Specifically, 1222 56.1 Statistical Method – 56.1.1 Hypothesis Testing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 0 = P0 {ZL+1 ≥ bL+1 |zL } + P0 {ZL+1 < bL+1 and ZL+2 ≥ bL+2 |zL } + . . . . . . + P0 { K−1 \ Zj < bj and ZK ≥ bK |zL } (56.1) j=L+1 (56.2) This CRP is calculated by applying the recursive integration algorithm of Armitage, McPherson and Rowe (1969). Müller and Schäfer (2001) have shown that, no matter what data dependent changes one makes at look L, the overall unconditional type-1 error of the entire trial with respect to all possible trial modifications at look L will be preserved provided the CRP for the modified trial beyond look L, under H0 , remain fixed at 0 . Moreover, as the trial proceeds, the same process can be repeated again with further trial modifications that also preserve the CRP of the remainder of the trial. For practical implementation in East one would conduct the adaptive trial as though it consisted of two trials; one primary and the other secondary. The initial design, prior to any adaptation is known as the primary trial. Suppose that at some look L of the primary trial the decision is taken to make an adaptive change in the design. At that point one would invoke East’s conditional power calculator from the interim monitoring worksheet to obtain 0 (56.1.1). One would then use East to design a one-sided secondary trial with 0 as the significance level. This secondary trial would incorporate all the desired adaptive changes such a sample size change, spending function change, etc. The secondary trial would then be monitored as though it were a completely a separate trial with no relationship to the primary trial except for carrying over the significance level 0 . Acceptance or rejection of the null hypothesis in the secondary trial would imply acceptance or rejection of the null hypothesis overall. We shall illustrate this approach with the help of a detailed example in Section 56.2 56.1.2 Parameter Estimation The material in this section summarizes the paper by Mehta, Bauer, Posch and Brannath (2007). It is fairly technical and may be skipped if you simply wish to design, monitor and simulate an adaptive trial by the Müller and Schäfer method. In that case you may proceed directly to Section 56.2. A careful study of this section will, however, provide you with a deeper appreciation of the difficulties of parameter estimation. We will only consider parameter estimation for adaptive designs with one-sided hypothesis testing and no futility boundaries, since this is the only setting in which 56.1 Statistical Method – 56.1.2 Parameter Estimation 1223 <<< Contents 56 * Index >>> Muller and Schafer Method East currently provides point and interval estimates by the extended Müller and Schäfer method. (For the two-sided case one may use the repeated confidence intervals and the repeated p-values discussed in Chapter 54, Section 54.1.2). Accordingly we consider a level-α test of H0 : δ = 0 (56.3) versus the one-sided alternative hypothesis that δ > 0. We shall be interested in estimating δ, the lower confidence bound of the 100 × (1 − α)% confidence set Cα = (δ, ∞) . We shall also be interested in estimating δ̃, a point estimate for δ, and p1 , a one sided p-value for the test of H0 . A general way to construct a 100 × (1 − α)% confidence set Cα , applicable to both non-adaptive and adaptive group sequential trials is by performing a level-α test of the hypothesis Hh : δ = h (56.4) versus the one-sided alternative hypothesis that δ > h. The confidence set Cα will then consist of all values h having the property that the hypothesis (56.4) cannot be rejected by a level-α one-sided hypothesis test. The lower limit of the confidence set Cα is therefore the supremum of the set of all h for which (56.4) is rejected by a level-α one-sided hypothesis test . It remains only to find a way to perform such a test in the adaptive setting. Let us first review the Müller and Schäfer method for performing the one-sided test of the null hypothesis (56.3) that δ = 0 at level α in the adaptive setting. For j = 1, 2, . . . K, let Zj denote the Wald statistics and bj denote the efficacy boundaries of a K-look one-sided level-α group sequential test. At some interim look L, where ZL = zL , it is decided to alter the future course of the trial through an adaptive change. In order to preserve the type-1 error of the trial despite the adaptive change, the following steps must be followed: 1. Compute the conditional rejection probability K [ = P0 (Zj ≥ bj )|zL . (56.5) j=L+1 2. Use the so obtained as the significance level of a K (2) -look secondary trial (2) (2) with Wald statistics Zj and efficacy boundaries bj , j = 1, 2, . . . K (2) , in 1224 56.1 Statistical Method – 56.1.2 Parameter Estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 which all the adaptive changes have been incorporated. Thus (2) (2) P0 {∪K j=1 (Zj (2) ≥ bj )} = , where all quantities associated with the secondary trial are tagged with the superscript (2) . 3. Monitor the secondary trial until it is terminated at some stage L(2) ≤ K (2) . Compute the stage wise adjusted p-value (see for example, Jennison and Turnbull 2000, page 179) L(2) [−1 p(2) = P0 ( (2) (2) {Zj (2) (2) ≥ bj } ∪ {ZL(2) ≥ zL(2) } . (56.6) j=1 4. Reject H0 if p(2) ≤ . By the Müller and Schäfer principle this is a level-α test of H0 . Now consider how the procedure might be extended to produce a level-α test of Hh . Analogous to (56.5) and (56.6) we must compute the conditional rejection probability (h) and the secondary trial p-value p(2) (h) under the hypothesis that δ = h. The expression for p(2) (h) is a straightforward extension of (56.6) and is given by p (2) L(2) [−1 (2) {Zj (h) = Ph ( (2) (2) (2) ≥ bj } ∪ {ZL(2) ≥ zL(2) } (56.7) j=1 where Ph (.) denotes probability under Hh . RCI Method We have shown in Mehta, Bauer, Posch and Brannath (2007) that (h) = Ph K [ p (Zj − h Ij ≥ bj |zL ) (56.8) j=L+1 where Ij is the Fisher information at look j. BWCI method Please refer to Gao, Liu and Mehta (2013) for details of BWCI method. 56.1 Statistical Method – 56.1.2 Parameter Estimation 1225 <<< Contents 56 56.2 * Index >>> Muller and Schafer Method Implementation of Hypothesis Testing 56.2.1 Designing the Primary 56.2.2 Monitoring the Primary 56.2.3 Primary Trial 56.2.4 Secondary Trial 56.2.5 Combining Trial 56.2.6 Simulation We illustrate the Müller and Schäfer method in this section through a worked example that includes the design of the trial, its adaptive re-design,and verification of its operating characteristics by simulation. Parameter estimation and p-value computation are presented separately in Section 56.3 since these capabilities are only available for one-sided tests. 56.2.1 Designing the Primary Trial We begin with a one-sided, level 0.025, three look, group sequential design, with LD(OF ) spending function, for testing the difference of means, δ, in a two arm randomized clinical trial with a normally distributed primary endpoint. The study is designed to have 90% power to detect δ = 15 at σ = 50. To design this study using EastAdapt, select as shown in the screen below. Change the Number of Looks to 3. You will see a new tab Boundary Info added. Before we go to this tab, change Input Method to Difference of Means, Diff. in Means to 15 and Std.Deviation to 50. Keep other default selections without any change. Now, the input dialog box will look as shown below. Click on the tab Boundary Info. Keep all the default selections in this tab without any 1226 56.2 Implementation of Hypothesis Testing – 56.2.1 Designing the Primary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 change. This tab inputs will look as shown below. Click Compute and the outputs for the design will be displayed in the Output Preview window in a newly added row. Now you can add the design output to the library workbook by clicking on the icon . This action saves the design Des1 as a node under the workbook Wbk1 in the library. You can click on the icon in the library to get the output summary as shown 56.2 Implementation of Hypothesis Testing – 56.2.1 Designing the Primary 1227 <<< Contents 56 * Index >>> Muller and Schafer Method below. We see that the study will achieve the desired power at a maximum sample size of 473 subjects. However, the values of δ and σ on which these calculations rest were selected after considerable discussion and disagreement amongst the investigators. There was a scarcity of reliable data from previous studies about the treatment arm, the patient population and the primary endpoint. Thus the sample size of 473 was selected as a compromise, with the understanding that this important design parameter would be re-assessed at the first interim look, using data from the trial and possibly other external information that might become available at that time. 56.2.2 Monitoring the Primary Trial To monitor this trial click on the ’Create Interim Monitoring’ icon interim monitoring worksheet. to invoke the The parts of this sheet are shown, for visual clarity, in separate screen shots displayed 1228 56.2 Implementation of Hypothesis Testing – 56.2.2 Monitoring the Primary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. 56.2 Implementation of Hypothesis Testing – 56.2.2 Monitoring the Primary 1229 <<< Contents 56 * Index >>> Muller and Schafer Method The top portion of the Interim Monitoring sheet is where the inputs for the interim looks will be entered and is displayed here again. In the IM sheet you are ready to enter values for Look 1. Click on the button to see the Test Statistic Calculator dialog box displayed as shown 1230 56.2 Implementation of Hypothesis Testing – 56.2.2 Monitoring the Primary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. Some default values for Look 1 are already estimated and displayed in the calculator. You are free to change these values depending on your actual data. Suppose the first interim look is taken when data are available on n1 = 158 subjects. Further, suppose that the observed difference of means is δ̂1 = 8 and the observed standard deviation is σ̂1 = 55. Enter the value 8 for the estimate of δ and enter the square root of (4 × σ̂12 /158) = 4 × 552 /158 = 8.751 as the standard error of estimate of δ̂ into the appropriate cells of this calculator. (Note that you can either type in a numerical value or a formula into the cells of any dialog box that accepts numerical values.) 56.2 Implementation of Hypothesis Testing – 56.2.2 Monitoring the Primary 1231 <<< Contents 56 * Index >>> Muller and Schafer Method Click the Recalc button to see the output of test statistic value in the calculator. Now click on OK to post these entries into the interim monitoring worksheet. The current information fraction is t1 = 158/473 = 0.334. East populates the first row of the interim monitoring worksheet with the observed test statistic z1 = δ̂1 /se(δ̂1 ) = 0.914, the corresponding efficacy stopping boundary = 3.706, the repeated 97.5% confidence interval limits for δ, and the repeated p-value. 56.2.3 Making Adaptive Changes to Primary Trial The observed value of the Wald statistic at the first look, is z1 = 0.914 whereas the critical value for rejecting H0 is b1 = 3.706. Thus a traditional group sequential trial would continue on to the next interim monitoring time point. Here, however, we have built in the flexibility to re-assess the adequacy of the sample size specified at the start of the trial. How should this be done? There are two aspects to this question; logistical 1232 56.2 Implementation of Hypothesis Testing – 56.2.3 Primary Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and scientific. We have already mentioned the logistical difficulties in Section 53.5 of Chapter 53 and will not discuss them further here. The scientific question is, how should one decide on the new sample size? The observed treatment difference δ̂1 = 8 is considerably smaller than the value δ = 15 at which the trial was powered. For an estimate of the conditional power that at any future look, that the test statistic . You will see the value will cross the stopping boundary, click on the icon following Conditional Power Calculator, displaying the conditional power as 0.311. You can also see the conditional power value for different assumed values of δ from 56.2 Implementation of Hypothesis Testing – 56.2.3 Primary Trial 1233 <<< Contents 56 * Index >>> Muller and Schafer Method the conditional power chart as shown below. We shall shortly discuss the important role that conditional power plays in making an adaptive modification to the trial. We would like to increase the sample size and thereby boost up the conditional power. The computation of conditional power, however, requires us to input a value for δ. Now, of course, the true value of δ cannot be known. While considerable weight should be given to the point estimate δ = 8 obtained at the interim analysis, it might be wise to retain the flexibility to use this estimate in conjunction with other data from the trial, and other externally available data. The Müller and Schäfer method gives you this flexibility. You can revise the sample size in any manner that seems appropriate at the time of the interim analysis, without having to pre-specify a particular decision rule for determining the new sample size. (You may also decide that no sample size increase is warranted.) We will assume that the trial investigators have taken advantage of this flexibility to review the interim data, as well as all relevant external data, and have finally determined that the clinically meaningful value at which to power the study should be revised downwards to δ = 10. Based on the observed data, the investigators continue to assume that σ = 50. Suppose therefore that they wish to increase the sample size so as to increase the conditional power to 90% at δ = 10 and 1234 56.2 Implementation of Hypothesis Testing – 56.2.3 Primary Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 σ = 50, while simultaneously preserving the type-1 error at 0.025 despite the data dependent change. The Müller and Schäfer method achieves this goal through a re-designed secondary trial as shown next. The trial has so far only proceeded to the first interim look with 158 subjects enrolled, and the current value of the test statistic is z1 = 0.914. Its current status can be depicted graphically in East by clicking on the ‘yellow up-arrow’ at the top of the thumbnail chart titled Stopping Boundaries in the interim monitoring worksheet, and checking off Show Design check box in the expanded chart that appears. This chart displays the status quo. It shows us the current position of the test statistic in relation to the current and future stopping boundaries. Our objective is to re-design the continuation of this trial with appropriate changes to the sample size and stopping boundaries, and possibly also to the spending function, the number of remaining looks and their spacing. In effect, we wish to capitalize on having taken an unblinded look at the data from the 158 subjects already enrolled to re-design the trial so that it has a better chance of success and utilizes the data yet to be collected more efficiently. At the same time we do not wish to ignore the data already obtained when we perform the final analysis, and we do not want this trial to lose its pivotal status by failing to 56.2 Implementation of Hypothesis Testing – 56.2.3 Primary Trial 1235 <<< Contents 56 * Index >>> Muller and Schafer Method preserve the overall type-1 error. The Müller and Schäfer method makes this possible. We stated in Section 56.1 that the unconditional type-1 error, over all possible design modifications, is preserved provided that each time a design modification is made, the remainder of the trial preserves the CRP, 0 . Here in our example with one-sided test, the first step is to compute 0 using equation (56.1.1). For example, in the above nominal critical point chart, z1 = 0.914, the boundary at sample size n2 = 315 is (b2 ) = 2.513 and the boundary at sample size n3 = 473 is (b3 ) = 1.993. Therefore 0 = P0 {Z2 ≥ 2.513|z1 = 0.914)} +P0 {Z2 < 2.513 and Z3 ≥ 1.993|z1 = 0.914} . The Müller and Schäfer calculator provided by EastAdapt can evaluate this CRP. With the cursor in any cell of the interim monitoring worksheet of Plan1, click on the icon . The following Müller and Schäfer calculator dialog box appears, revealing that at the first interim look the sample size is 158, the observed value of the test statistic is z = 0.914. The calculator permits the user to enter values for δ and σ, or for δ/σ, and computes conditional power assuming no change in the future course of the current group sequential design having a maximum sample size 473. The default values of δ and σ when this calculator is first invoked are the values that were entered into the interim monitoring worksheet through the test statistic calculator for the current look. In this case, the values were δ = 8 and σ = 55, resulting in δ/σ = 0.145. The conditional power if the trial proceeds without any design modification is shown to be 0.311. To obtain the conditional type-1 error (or conditional rejection 1236 56.2 Implementation of Hypothesis Testing – 56.2.3 Primary Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 probability) we enter the value 0 in the δ/σ edit box and press the Recalc button. The conditional type-1 error is seen to be 0 = 0.038. We can make any desired modifications to the remainder of the trial, such as changing the remaining sample size, or the number of future interim look and their locations, provided we preserve the conditional rejection probability. Accordingly, it is decided that the trial should continue to be extended to two further looks but should utilize the LD(PK) (Pocock) spending function instead of the current LD(OF) (O’Brien-Fleming) spending function for the stopping boundary, so as to increase the chance of early termination. Additionally, the sample size should be increased appropriately to make the conditional power, given z1 = 0.914 at δ = 10 and σ = 50, equal to 90%. In keeping with the Müller and Schäfer principle, the new stopping boundaries should be such that if in fact δ = 0, the probability of crossing the boundary is 0.038. That is, we must preserve the conditional type-1 error of the original unmodified trial in the modified trial. 56.2.4 Implementing Adaptive Changes through Secondary Trial 56.2 Implementation of Hypothesis Testing – 56.2.4 Secondary Trial 1237 <<< Contents 56 * Index >>> Muller and Schafer Method At first sight, it appears complicated to modify the boundaries of the on-going trial in which z1 = 0.914 so as to fulfill conditional type-1 error requirement, 0 = 0.038, with a Pocock LD(PK) spending function for the boundary. The solution, however, is rather simple and can be accomplished very naturally within EastAdapt. The approach, proposed by Müller and Schäfer , is to step away from the actual trial (hereafter referred to as the primary trial) at its look L = 1, (where an adaptive change has been requested), and to instead design an independent secondary two-look trial that has 90% power to detect δ = 10 at σ = 50, and utilizes the Pocock LD(PK) spending function to generate the boundary, with α = 0.038. Note that this error probability is the only statistic that we are required to carry forward from the primary trial into the design of the secondary trial. The further progress of the primary trial may then be conveniently monitored by entering the observed values of the test statistic, computed only from incremental data generated after trial modification into the interim monitoring worksheet of this secondary trial. In particular, the value z1 = 0.914 from the primary trial plays no role in the interim monitoring of the secondary trial, since this value was already factored into the computation of 0 . We illustrate below. To design the secondary trial, click on Des1 node in the library, and then click on the icon . In the ensuing input dialog box, enter Number of Looks as 2, Test Type as 1-sided, Type I Error(α) as 0.038, power as 0.9, and Mean and SD values under alternative as 10 and 50 respectively as shown below. 1238 56.2 Implementation of Hypothesis Testing – 56.2.4 Secondary Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on Boundary Info tab and change the efficacy boundary as PK, as shown below. Now click on Compute. A new row will be added in the Output Preview window. Click on the row and save it in the library as Des2 node. Now select Des1 and Des2 nodes by holding ctrl key and then click Output Summary icon . You will see the following screen shot displaying the results for Des1 and Des2 side by side. Des2 requires a maximum sample size of 1040 subjects and calls for two equally 56.2 Implementation of Hypothesis Testing – 56.2.4 Secondary Trial 1239 <<< Contents 56 * Index >>> Muller and Schafer Method spaced looks, with a LD(PK) stopping boundary, as displayed below. The further progress of the modified trial is now monitored on the interim monitoring worksheet of Plan2. Click on the tool to invoke the interim monitoring worksheet. Suppose the data are monitored after 480 new subjects enter the trial. This corresponds to a total enrollment of 158 + 480 = 638 subjects in the primary and secondary trials combined together. The secondary trial, however, only monitors the incremental data obtained from the 480 new subjects. Let us assumepthat these 480 new subjects provide the estimates δ̂ = 10 and σ̂ = 52, so that se(δ̂) = 4 × 522 /480 = 4.747. Enter these values into the Plan2 interim monitoring worksheet in the usual manner as shown 1240 56.2 Implementation of Hypothesis Testing – 56.2.4 Secondary Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. Click on OK to post these numbers into the interim monitoring worksheet. Now the observed value of the test statistic is 2.107 whereas the upper stopping boundary to reject H0 is 2.011. Therefore you’ll be notified by East that the stopping boundary has been crossed. Click on the Stop button to terminate the trial and the Final Inference details are 56.2 Implementation of Hypothesis Testing – 56.2.4 Secondary Trial 1241 <<< Contents 56 * Index >>> Muller and Schafer Method displayed as shown below. Since the test statistic has crossed the upper boundary, the null hypothesis δ = 0 is rejected. The requirement that the CRP be preserved no matter how the original trial is modified ensures that the unconditional type-1 error of the primary trial, taken over all possible trial modifications within any family of modifications under consideration, will always be preserved. We shall verify this fact through simulation in Section 56.2.6. It should be noted that the confidence interval, point estimate and p-value displayed on the interim monitoring worksheet of the secondary trial are not valid for the overall trial. Those inferences must be made using the adaptive extension of the Müller and Schäfer (2001) procedure as described in Section 56.1.2. The implementation in EastAdapt is shown in Section 56.3. 56.2.5 Trials Reconstructing a Combined Trial from the Primary and Secondary The secondary trial was terminated after a single look, taken at a sample size of 480 subjects. The interim monitoring worksheet of the secondary trial showed that the stopping boundary at this look was 2.011. Although not strictly necessary, it is instructive to transform these boundaries appropriately and attach them to the primary trial thereby recreating the combined trial in one piece. The path traced out by the test statistic in the secondary trial can likewise be appropriately transformed and attached to the test statistic generated in the primary trial before the trial was modified. The reconstruction is helpful for clarifying that it is the combined trial and not the secondary trial that is actually being monitored after an adaptive change in the design. The secondary trial is an artificial construct; a convenient way to obtain new stopping boundary satisfying the specification of the conditional rejection probability in the 1242 56.2 Implementation of Hypothesis Testing – 56.2.5 Combining Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 combined trial. Let us illustrate by reconstructing the combined trial for our example. In the discussion that follows, we shall distinguish between data from the primary and secondary trials by labeling the test statistics, stopping boundaries and sample sizes with superscripts. (1) For example the sample size at look 1 in the primary trial is denoted by n1 while the (2) sample size of the secondary trial at look 1 is denoted by n1 . Now recall that the (1) primary trial only proceeded up to the first interim look with a sample size n1 = 158 and corresponding stopping boundary 3.706. The mean and standard error of δ at (1) (1) (1) look 1 were δ̂1 = 8 and se(δ̂1 ) = 8.751, leading to the Wald statistic z1 = 0.914. At this point we implemented an adaptive change in the primary trial with the following requirements: Conditional rejection probability at δ = 0 should be 0.038 Conditional power at δ = 10, σ = 50 should be 90% Two equally spaced additional looks with the LD(PK) spending function spending the type-1 error according to the CRP. These requirements were incorporated into a secondary trial displayed below as Des2. Although Des2 was designed for two equally spaced looks, at sample sizes 520 and (2) 1040 respectively, the first look was actually taken at a sample size of n1 = 480. The (2) information fraction at this look was t1 = 480/1040 = 0.462. By spending the appropriate amount of error at this information fraction the stopping boundaries was 56.2 Implementation of Hypothesis Testing – 56.2.5 Combining Trial 1243 <<< Contents 56 * Index >>> Muller and Schafer Method (2) (2) obtained as 2.011. We observed δ̂1 = 10, and se(δ̂1 ) = 4.747, resulting in the Wald (2) statistic z1 = 2.107. We now show how to represent the stopping boundaries and test statistic values of the primary and secondary trials through a single combined trial. Suppose that a K-look primary trial is monitored up to and including look L < K, at which point an adaptive change takes effect. Suppose that all new data obtained after the adaptive change are monitored through a K (2) -look secondary trial which terminates at some look K 0 ≤ K (2) . It is possible to prove that monitoring the primary and secondary trials separately, as was done above, is equivalent to monitoring a single combined trial consisting of L + K 0 looks. The stopping boundaries and test statistic values for the first L looks of this combined trial are identical to the corresponding values of the primary trial. The value of the test statistic at look L + j, j = 1, 2, . . . K 0 , of the combined trial is q q (1) (1) (2) (2) z n + z nj j L L (c) q zL+j = . (56.9) (2) (1) nL + nj The value of the stopping boundary at look L + j of the combined trial is q q (1) (1) (2) (2) z n + b nj j L L (c) q bL+j = . (1) (2) nL + nj (56.10) For more general settings, such as binomial or survival data, we would replace sample size by Fisher information in each of the above formulae. Applying these formulae to the example under consideration we have L = 1 and K 0 = 1 so that the combined trial consists of L + K 0 = 2 looks. The boundaries and test statistics for the first look of the combined trial are identical to the corresponding values of the primary trial. The upper stopping boundary of the second look of the combined trial is obtained from equation (56.10) to be √ √ 0.914 × 158 + 2.011 × 480 (c) √ = 2.199 b2 = 158 + 480 The value of the test statistic at the second look of the combined trial is obtained from equation (56.9) to be √ √ 0.9142 × 158 + 2.107 × 480 (c) √ z2 = = 2.282 158 + 480 1244 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (c) (c) Since z2 > b2 , the combined trial is terminated. 56.2.6 Verifying Operating Characteristics by Simulation Simulation is a very valuable tool for making adaptive decisions that suit the needs of the study. For example one might want to place an upper bound on the magnitude of the sample size increase following an adaptive look at the interim results, or one might want to place a lower bound on the estimated value of δ such that no sample size increase would be permitted should the estimate fall below the lower bound. These and other similar restrictions will affect the power of the study as well as the expected sample size in ways that might not be analytically tractable. One can, however, easily estimate power and expected sample size for various adaptive designs through simulation. There is a second important reason for including a simulation tool in EastAdapt. We have made a major claim that by preserving the CRP after any type of adaptation, we will automatically preserve the unconditional type-1 error, taken over all possible adaptations, as well. A convincing way to demonstrate that this claim is correct is through simulation. To illustrate how to use the simulation tool in EastAdapt, let us consider once again Des1 that we created in Section 56.2.1. With the cursor on Des1 node in the library, . You will see the following simulation input/output dialog box with click on three tabs Simulation Parameters, Response Generation Info, and Simulation Control Info. These are the same tabs you would have come across in the earlier chapters of the manual. 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1245 <<< Contents 56 * Index >>> Muller and Schafer Method Now click on the button Include Options and choose the item Sample Size Re-estimation. This will add a fourth tab, bearing the same name, to the dialog box. . Select the radio button against Müller and Schäfer . You will get the following dialog 1246 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 box . Click on the button ‘Yes’. In the resulting dialog box, Specify Adapt at Look # as 2 and Max.Sample Size if Adapt (multiplier, total #) as 2. You will see the max.sample size is computed and displayed as 946. Keep other specifications at the default values. . Now click on the button Specify Stage II Design. In the resulting dialog box, specify the Stage II Design details as described below. The above dialog box has three sections. Number of Looks Specify number of looks as 2. Specification of α for Stage II This section of the dialog box asks you to specify how 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1247 <<< Contents 56 * Index >>> Muller and Schafer Method EastAdapt is to obtain the type-1 error for creating each simulated design of the Stage II or secondary trial . There are two choices. The default item is Conditional Type-1 Error from Stage-1. If you select this option, EastAdapt computes the conditional rejection probability 0 from each simulation of the primary trial at its look L, where an adaptive change has been requested. The secondary trial is then designed so as to spend α = 0 . If you choose the User Specified item, then, you will have to specify how much α you would like to spend for the secondary trial. Ordinarily the default option should be selected as it ensures that the overall type-1 error of the adaptive trial will be preserved. Specification of δ for Stage II This query of the dialog box asks you to specify how EastAdapt is to obtain the value of δ at which to power each simulation of the Stage II (secondary) trial. If you choose the Estimated from Stage I item, EastAdapt will use the value of δ estimated from the primary trial at its look L, where an adaptive change has been requested. If you choose the User Specified radio button, you will have to specify the value of δ at which to power the secondary trial. We stated in Section 56.2.3 that the secondary trial will be powered at δ = 10. Therefore we select the User Specified radio button. Specification of σ for Stage II This query of the dialog box asks you to specify how EastAdapt is to obtain the value of σ for each simulation of the Stage II (secondary) trial. If you choose the Estimated from Stage I radio button, EastAdapt will use the value of σ estimated from the primary trial at its look L, where an adaptive change has been requested. If you choose the User Specified radio button, you will have to specify the value of σ in a subsequent dialog box. We stated in Section 56.2.3 that the secondary trial will be powered at σ = 50. Therefore we select the User Specified radio button. 1248 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The dialog boxes will look as shown below. Click OK and you will get the following dialog box where the summary details of Stage I and Stage II designs are displayed side by side. . Now we are ready for carrying out our simulation of the trial. Let us call this as ‘Experiment 1’. 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1249 <<< Contents 56 * Index >>> Muller and Schafer Method Experiment 1: Explaining the Basics of the Simulation Tool Suppose we specify simulation parameters as described below. Data for each simulation of the primary trial will be generated from a normal population with a difference of means δ = 12 and population standard deviation σ = 50. In each simulation, the primary trial will proceed through L = 2 looks, each look being equally spaced with 473/3 = 158 subjects. After look 2, there may be an adaptive change, depending on the simulated data obtained at look 2. At the end of the second look, when the sample size is 316, the conditional power will be computed. This computation will utilize the estimates δ̂ and σ̂ obtained from the simulated data up to look L. The value of the conditional power estimate in relation to the re-design criteria Min.CP and Max.CP will determine whether or not the primary trial should undergo an adaptive change. If the conditional power obtained under the current design falls between 30% and 90%, then an adaptive change will be made to the primary trial. Alternatively, you can specify Promising Zone range in terms of Test Statistic or Estimated δ/σ, by making the choice in the drop down box. If adaptive change is decided, in that case: – As explained previously, the adaptive change to the primary trial will be implemented indirectly by invoking a secondary trial whose plan details are shown as Specify Stage II Design on this screen.. – The secondary design is one-sided and spends its α = 0 . The value assumed by this conditional rejection probability depends on the value of (1) zL obtained in the primary trial. – There will be two equally spaced looks in the secondary trial with both α being spent according to the LD(PK) (Pocock) spending function. – The sample size of the secondary trial will be computed so that this trial can achieve (1 − β) = 0.9 power under the alternative hypothesis δ1 = 10 with σ = 50. – This indirect approach corresponds to modifying the primary trial in such a way that the conditional power for the remainder of the trial, given the (1) observed value of zL is 90%. We have stated that EastAdapt will compute the sample size required in order for the secondary trial to achieve (1 − β) = 0.9 power at significance level of (2) α = 0 . Denote this sample size by Nmax , and denote the combined sample size, (c) to be utilized by both the primary and secondary trials, by Nmax . In the present (c) (2) example, since L = 2, we must have Nmax = 316 + Nmax . More generally (c) (1) (2) Nmax = nL + Nmax 1250 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the present example, the field titled Max. Sample Size if Adapt has been set to 946 through a multiplier of 2 on primary trial max. sample size which is 473. This means that: (c) – If Nmax < 473, EastAdapt will extend the combined sample size to c = 473. In that case the sample size of the secondary trial will be Nmax (2) correspondingly increased to Nmax = 473 − 158 = 316. c – If Nmax > 946, East Adapt will truncate the combined sample size to (c) Nmax = 946. In that case the sample size of the secondary trial will be (2) correspondingly truncated to Nmax = 946 − 316 = 630. There is also a Conditional Power calculator available in this dialog box, which you can access by clicking on the button . This calculator will be useful to understand the simulation parameters and their impact on the simulation results. The calculator has two functions one for Stage I Design and the other for Stage II Design. By default, Stage I Design will appear selected as shown below. You may enter any input values involving δ, σ , and z and can get the computed conditional power for the Stage I Design. In this example, the default values for δ/σ and z of 0.197 and 1.746 are displayed corresponding to conditional power of 0.6 which is a mid-value in the range specified for promising zone - 0.3 to 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1251 <<< Contents 56 * Index >>> Muller and Schafer Method 0.9. You may change the input values to see their impact on computed conditional power in Stage I. Now select the radio button against Stage II Design as shown below. Since at Stage I, the computed conditional power of 0.6 is in the Promising Zone, adaptation takes place. Further, the computations show that the maximum sample size for Stage II design is 571 and that for the integrated trial is 886 in order to achieve 90% power in Stage II. The implication is that we can choose a multiplier less than 2 in the specification for maximum sample size (886/473 = 1.87), provided the Stage I results assumption holds good. Now click on the button ‘Details’. You will see the two Boundary Plots as shown 1252 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. In the Boundary Plot for the integrated design, the first and second looks boundaries correspond to those of Stage I design. The point plotted below the second look boundary value correspond to the z value estimated at that look. You may reach this point by different routes from the first look z value. For illustration, five different routes are shown, all joining the second look z value. Next, choose Promising Zone Plots in the drop down box under Details. You 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1253 <<< Contents 56 * Index >>> Muller and Schafer Method will see two plots -Required Sample Size and Conditional Power Plots. The Required Sample Size plot shows that required sample size increases to a maximum of 946 at the start of promising zone, that is at CP=0.3 and gradually reduces when reaching the end of promising zone, that is at CP=0.9. Outside the promising zone, the required sample size remains steady at the stage I maximum sample size value of 473. The Conditional Power plot shows the relationship between the conditional power without SSR and the conditional power with SSR, under a reference value of δ/σ. The conditional power with SSR increases to maximum values in the promising zone. There will be 10000 simulated trials and the screen will be refreshed after every 1000 simulations, and the starting seed for the simulations will be 100. To run 10000 simulations of this adaptive design click on the Simulate button. After 10000 simulations are done click Close. East will add the results in a new row in Output Preview Window. Click on this row and add it to the library node under Des1. If you double-click on this node you will see the simulation results displayed in several small tables. You can collapse or expand each of these tables by clicking on down arrow or right arrow buttons at the top left hand side in each table. First let us look at the table on the far left side of the screen. 1254 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1255 <<< Contents 56 * Index >>> Muller and Schafer Method The above tables show parameters of stage-I design, parameters for sample-size re-estimation, and the parameters for stage-II design. Now let us look at the Tables on the right side. Zone-wise Averages The first tables on the right-side is displayed below. The above results show the classification of simulations by the criterion used under Sample Size Re-estimation parameters section in the simulation sheet. Out of 10000 simulations carried out, as there was no futility 1256 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 rule in the design, no simulation was stopped for futility. In 3540 simulations, H0 was rejected at the first or second look itself (Efficacy). In 2261 simulations, the CP was less than 0.30 (unfavorable zone) and in 1085 simulations, the CP was greater than 0.90 (favorable zone) and in both these cases, no adaption was needed. In the remaining 3114 simulations, where CP was between 0.3 and 0.9 (promising zone), adaption might be needed. You may also notice what eventually happened to the simulations under each of the three Zones. Of the 2261 simulations that fell under unfavorable zone where no adaption was made to the sample size, in 563 simulations (24.9%), H0 was rejected eventually. In all the 1085 simulations that were classified into favorable zone, in 987 simulations (91.0%), H0 was rejected. Compared to these, in the 3114 simulations that were in the promising zone and where adaption in sample size was made, in as much as 2944 (94.5%) simulations, H0 was rejected. This result illustrates the positive impact, the sample size adaption can bring about in a trial. Simulation Results for Integrated Trial This table for the integrated trial, shows, look by look, information on the average sample size, the number of simulations in which the boundary for efficacy was crossed and the total number of simulations. The last row shows that the power attained in the integrated trial is 80.34%. Zone-Wise Percentiles The table below shows the distribution of sample sizes in each 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1257 <<< Contents 56 * Index >>> Muller and Schafer Method zone in terms the percentiles. Simulation Results for Stage II Trial the table shown below gives the simulation results for Stage II alone, look by look. Simulation Boundaries for Stage I Design The last table shown below gives the 1258 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 details of simulation boundaries for Stage I Design. Experiment 2: No Sample Size Increase In Experiment 1, we permitted a sample size increase up to a maximum of 946 subjects. This sample size increase enabled the clinical trial to recover power even though the simulations were performed with δ = 12 whereas the primary trial was actually a design to detect δ = 15 with 90% power. If we were to impose the restriction that there should be no sample size increase in these simulations, we would expect to lose power. To see this, re-run the simulations with the Simulation Parameters as shown below. Notice that the sample size is not permitted to change from the initially specified value of 473 in these simulations. Click on the Simulate button and observe the results 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1259 <<< Contents 56 * Index >>> Muller and Schafer Method shown below. This time only 7338 of the 10000 simulations of the combined trial rejected H0 , yielding 73.38% unconditional power. Of the 10000 simulated trials, 3181 required a sample size increase and therefore activated the secondary trial. However, since no sample size increase was forthcoming, only 2282 of these trials were able to reject H0 , resulting in 71.74% conditional power. Experiment 3: Preserving the Unconditional Type-1 Error The statistical validity of the Müller and Schäfer adaptive procedure hinges on the claim that, despite making data dependent changes to the primary trial, the unconditional type-1 error is always preserved so long as the conditional rejection probability in effect at the time of the adaptive re-design is preserved. To verify this claim, edit Des1 by changing the type-1 error from 0.025 to 0.05, and by changing the spending function from LD(OF) (O’Brien-Fleming) to LD(PK) (Pocock), and save the edited design as Des3. These changes will exaggerate any possible inflation of type-1 error, and will thereby provide stronger empirical evidence for the validity of the Müller and Schäfer procedure. Des3 is displayed below as a group sequential design with three equally spaced looks and a 1260 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 maximum sample size of 442 subjects. Suppose we decided to convert Des3 into an adaptive design in the following manner: 1. Proceed with the primary group sequential trial up to and including look 2. (1) (1) 2. Let z2 denote the observed value of the Wald statistic Z2 at look 2 of the (1) primary trial, and let δ̂2 be the estimate of δ at look 2 of the primary trial. (2) 3. Compute the sample size Nmax for the secondary trial (i.e., for the remainder of the combined trial) so as to make the conditional power, given the observed (1) (1) value of z2 , equal to 90% under the assumption that δ = δ̂2 . Now let us simulate this adaptive design 10,000 times in two ways. First we will simulate the design without preserving the conditional rejection probability, 0 , obtained at the end of look 2 of the primary trial. We will, instead, run each simulation of the secondary trial at the 0.1 level. 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1261 <<< Contents 56 * Index >>> Muller and Schafer Method Fill in the ensuing dialog boxes as shown below. By selecting the User Specified radio button for the Specification of alpha, we have informed EastAdapt not to use the conditional rejection probability for the secondary trial. By selecting Estimated from Stage-I for δ and σ, we have asked EastAdapt to make data dependent sample size changes to the trial based on estimates of these parameters obtained from the primary trial. Click on Simulate button to generate 10,000 simulations of this adaptive design. Save the results in the library node. By double-clicking on the node or by clicking on ’Details’ button, you 1262 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 will the following results along with other results. The Simulation Results for Integrated Trial panel shows that that 626 of the 10,000 simulations rejected the null hypothesis. Thus the type-1 error rate was 0.0626 which is excessive, even allowing for Monte Carlo error. We conclude that in these simulations the type-1 error was inflated. Next we will simulate the adaptive design while also preserving the conditional rejection probability, 0 , obtained at the end of look 2 of the primary trial. Keeping cursor on the last simulation node, click on ’Edit’ button and make selection for alpha as shown below. 56.2 Implementation of Hypothesis Testing – 56.2.6 Simulation 1263 <<< Contents 56 * Index >>> Muller and Schafer Method By selecting the Conditional Type-I Error from Stage-I radio button in the Specification of alpha from Stage-I, we are asking EastAdapt to use the conditional rejection probabilities obtained at the time of the adaptive change, for re-designing the trial. Therefore we would expect the unconditional type-1 error to be preserved. To see this, click on OK and then on Simulate to run 10,000 simulations. Now you will see the results as shown below. The Simulation Results for Integrated Trial table shows that only 524 of the 10,000 simulations rejected H0 , for an overall unconditional type-1 error rate of 0.0524. This demonstrates that the type-1 error of 0.05 was preserved up to Monte Carlo accuracy. 56.3 Implementation of Parameter Estimation 56.3.1 Parkinson’s Disease 56.3.2 BWCI versus RCI 1264 In this section we show how the generalization of the Müller and Schäfer (2001) method to the problem of parameter estimation has been implemented in EastAdapt. Results are presented for both the RCI method developed by Mehta, Bauer, Posch and Brannath (2007) and the BWCI (Backward Image Confidence Interval) method developed by Gao, Liu and Mehta (2013). 56.3 Implementation of Parameter Estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 We shall see that the BWCI method has some advantages over the RCI method. The BWCI method produces confidence intervals with exact coverage, whereas the RCI method produces conservative coverage. The procedure is based on mapping the final test statistic obtained in the modified trial into a corresponding backward image in the original trial. The BWCI method produces a median unbiased point estimate, whereas the RCI method provides point estimates that can be severely negatively biased. 56.3.1 Parkinson’s Disease Example To illustrate how parameter estimation has been implemented, we consider a slight modification of an example discussed in Müller and Schäfer (2001). Müller and Schäfer consider a clinical trial comparing deep brain stimulation to conventional treatment for Parkinson’s disease. The main outcome variable was the quality of life as measured by the 39-item Parkinson’s Disease Questionnaire (the PDQ-39). Since no prior PDQ-39 data on deep brain stimulation were available, the study was planned based on the data from the pallidotomy trial of Martinez-Martin (2000). This led to the assumption of an improvement by δ = 6 points in PDQ-39 for the treatment arm relative to the control arm. The standard deviation, also subject to considerable uncertainty, was assumed to be 17. We shall assume here that the trial was initially planned as a three-look group sequential design at the one-sided 0.05 level to test H0 : δ = 0. A sample size of 282 subjects was selected with equally spaced interim (1) (1) (1) monitoring after n1 = 94, n2 = 188, and n3 = 282 subjects, using the γ(−4) error spending function of Hwang, Shih, and DeCani (1990). Upon entering these parameters into East we obtain the following design (Plan1) with slightly over 90% (1) power to detect δ = 6, and Wald stopping boundaries given by b1 = 2.794, 56.3 Implementation of Parameter Estimation – 56.3.1 Parkinson’s Disease 1265 <<< Contents 56 * Index >>> Muller and Schafer Method (1) (1) b2 = 2.289, and b3 = 1.680. To illustrate our estimation procedure we implement a hypothetical (but realistic) scenario in which the first interim analysis is followed by an adaptive change to the design. Suppose that at the first interim analysis, when 94 subjects have been evaluated, the estimate of δ is δ̂ (1) = 4.5 with estimated standard deviation σ̂ = 20. We invoke the interim monitoring worksheet by pressing the icon. 1266 56.3 Implementation of Parameter Estimation – 56.3.1 Parkinson’s Disease <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Next click on ’Enter Interim Data’ to bring up the test statistic calculator Keep the cumulative sample size as 94. Enter δ̂ = 4.5 and se(δ̂) = p 4 × 202 /94 into 56.3 Implementation of Parameter Estimation – 56.3.1 Parkinson’s Disease 1267 <<< Contents 56 * Index >>> Muller and Schafer Method the test statistic calculator. Next, hit the Recalc button. Now click OK. This completes the data entry for the first interim look. At this point it is decided to increase the sample size since, if in truth δ = 4.5 and σ = 20, the conditional power is only about 60%, whereas we would prefer to proceed with at least 80% conditional power. The conditional rejection probability for the remainder of the trial is 0.1033. This can be seen by invoking the conditional power 1268 56.3 Implementation of Parameter Estimation – 56.3.1 Parkinson’s Disease <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 calculator icon CP, setting δ/σ = 0, and clicking on ’Recalc’ button. You can click on ’Close’ and close the calculator. We may construct any suitable secondary trial to take over from the primary trial at the present look, as long as the significance level of the secondary trial is = 0.1033. How should the secondary trial be designed? The real benefit of an adaptive trial lies in the fact that all aspects of the original design can be re-visited at an interim look. All the observed efficacy and safety data, rather than just the summary statistics δ̂ and σ̂, could be reviewed alongside any new external information that may also become available. Suitable design changes can then be made to the primary trial. In the present case we will assume that as a result of this type of review the investigators have determined that δ = 5 rather than δ = 6 would still constitute a clinically meaningful treatment benefit. Suppose then that the sponsor decides to re-design the study under the now more accurate assumption that δ = 5 and σ = 20. To this end they decided to adopt a three-look secondary trial with γ(−2) spending function and 80% power. The γ(−2) spending function was selected because, under the new alternative hypothesis δ = 5, it provides a reasonable chance of terminating for efficacy at the first or second interim looks. In keeping with the Müller and Schäfer principle the α for the secondary trial must be 0.1033. This secondary trial is constructed as shown below and displayed as Des2. 56.3 Implementation of Parameter Estimation – 56.3.1 Parkinson’s Disease 1269 <<< Contents 56 * Index >>> Muller and Schafer Method Enter α = 0.1033, 3 looks, 0.8 power, δ = 5 and σ = 20 into the first dialog box of the design wizard. Enter the γ(−2) spending function into the second dialog box of the design wizard. Click on the Compute button to complete the design of the secondary trial. Save the design output into the library. Select Des1 and Des2 nodes and click on output 1270 56.3 Implementation of Parameter Estimation – 56.3.1 Parkinson’s Disease <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 summary icon We see that the secondary trial requires a total sample size of 296 subjects,over three (2) (2) (2) equally spaced looks with n1 = 99, n2 = 197 and n3 = 296. (Note that this is over and above the 94 subjects already enrolled prior to the adaptive change.) To monitor the secondary trial, while cursor is on Des2 node, click on the icon. Suppose the following data are observed and the first and second interim looks, leading to termination of the trial at the second look. Look 1 2 SampSize 100 200 δ̂ 5.8 6.1 σ̂ 20.5 19.5 se(δ̂) 4.1 2.7577 Z = δ̂/se(δ̂) 1.4146 2.212 56.3 Implementation of Parameter Estimation – 56.3.1 Parkinson’s Disease 1271 <<< Contents 56 * Index >>> Muller and Schafer Method After these values are entered into the interim monitoring worksheet, it looks as shown below The stopping boundary at the second look was crossed and statistical significance has been achieved. The point and interval estimates of δ and the p-value displayed at the bottom right corner of the interim monitoring worksheet are, however, only valid for the secondary trial and not for the overall trial that combines the data from the first and second stages. 56.3.2 Evaluating the BWCI and RCI Methods by Simulation In the previous section we designed and monitored a clinical trial comparing deep brain stimulation and conventional therapy for Parkinson’s disease. EastAdapt provides a simulation tool for evaluating the properties of the two methods of estimation - RCI and BWCI. This tool can be invoked for any one-sided design. We shall demonstrate its utility by applying it to the Parkinson’s disease example. Return to the Des1 design that was created in the previous section for the Parkinson’s 1272 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 disease trial. With the cursor Des1 node, click on Simulate icon. In the resulting simulation input dialog box, click ’Include Options’ and select ’Sample size re-estimation’. This will add an additional tab with the same name. In this tab select Müller and Schäfer option. Now the tab dialog box will look as shown below. 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1273 <<< Contents 56 * Index >>> Muller and Schafer Method Change the max.sample size multiplier as 2 and promising zone max.CP as 0.9. In the Estimation Method, select RCI. Accept the default value for Confidence Coefficient as 0.95. Now click ’Specify Stage-II Design’. In the resulting dialog box, select number of looks as 3 and specify power as 0.8. Accept other default choices. The dialog box will appear as shown below. Now click on the tab Boundary Info, and specify spending function as Gamma with parameter −2. Click ’OK’. Now the simulation dialog box will appear as shown below. With the choices made, the design parameters of the secondary trial in each simulation will be estimated from the data generated from the primary trial at the time of the adaptation. The significance level (α) for this trial will be determined from the data of the primary trial, in keeping with the Müller and Schäfer principle. The sample size will be determined by the values of δ and σ that are estimated from the data of the primary trial. Click on ’Simulate’ button. After 10,000 simulations are carried out, click ’Close’ to get results in the Output Preview and add it a library node. With the 1274 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 cursor on this node, click ’Details’ icon. In the resulting output at the bottom you will the results RCI Estimation method as shown below. You can choose from the plot icon menu, the item ’Distribution of Confidence Bounds’ 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1275 <<< Contents 56 * Index >>> Muller and Schafer Method to get the following histogram. With the cursor on MSSim1 node, click Edit button. In the resulting dialog box, choose BWCI as Estimation method and click ’Compute MUE’ box. 1276 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now click ’Simulate’ button. After 10,000 simulations are done, carry out the required steps to save the simulation in a library node. With the cursor on this node, get the 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1277 <<< Contents 56 * Index >>> Muller and Schafer Method detailed output and see the BWCI Estimation results as shown below. 1278 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You can choose from the plot icon menu, the item ’Distribution of Confidence Bounds’ to get the following histogram. The results obtained so far help to evaluate the properties of the BWCI and RCI methods with respect to coverage. Similarly we can carry out simulations to evaluate bias, by specifying in the simulation parameters confidence coefficient as 0.5 for both 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1279 <<< Contents 56 * Index >>> Muller and Schafer Method the methods. Under BWCI method, the option ’Compute MUE’ also can be chosen. These simulation results are summarized in Table 56.1. Table 56.1: Comparison of BWCI and RCI methods for Parkinson’s Disease example with 3-look γ(−4) boundary for primary trial, adaptation at first look, and 3-look γ(−2) boundary for secondary trial. True δ 6 3 0 Actual Coverage of 95% CI BWCI RCI 0.949 0.995 0.95 0.985 0.948 0.95 Median of δ 0.5 BWCI RCI 5.939 1.929 3.028 0.438 0.021 -3.336 The results in the above table shows that while the coverage property of the two methods are similar, the bias in estimation is markedly more in RCI method compared to BWCI method. We know from the design surv-01 that a hazard ratio of 0.7 will yield 90% power. But what if the true hazard ratio was 0.77? The resultant deterioration in power can be evaluated by simulation. Accordingly we shall alter the Treatment cell, containing the hazard 0.0607, by replacing it with 0.77 ∗ 0.0866 = 0.0667. The “Sample Size Re-Estimation” Tab The impact of an adaptive increase in the number of events and sample size on power and study duration can be evaluated by simulation. Click the Sample Size Re-estimation tab. This tab contains the input parameters for performing the adaptive simulations and sample size re-estimation in the on-going trial. Select Muller and 1280 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Schafer button in the dialog box. You will see the following message on the screen: Click on Yes. Now you see the dialog box shown below. The Sample Size Re-estimation tab is the main location from which you will be using East to design adaptive time-to-event trials. Input Parameters for Sample Size Re-estimation This window consists of 10 input fields into which one may enter various design parameters. 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1281 <<< Contents 56 * Index >>> Muller and Schafer Method For a given set of design parameters, East will run a number of simulated trials as specified in the Simulation Control Info tab: On running the simulations, an entry for Simulation output gets added in the Output Preview pane and the detailed output can be seen in the Output Summary of Simulations. The input quantities in the Sample Size Re-estimation tab are described below in detail. 1282 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1. Adaptation at: For a K-look group sequential design, one can decide the time at which conditions for adaptations are to be checked and actual adaptation is to be carried out. This can be done either at some intermediate look or after accumulating data on specified number of events or after some specified information fraction. The value of this parameter depends upon the choice of the user. If it is Look no. then this parameter can be any integer number from 1 to K − 1. If the adaptation is to be carried out after observing specified events then this parameter can be some integer between [4, No. of events at design stage] and so on. The default choice in East is look number to decide the time of adaptation. 2. Max Number of Events if Adapt : This quantity is a multiplier with value ≥ 1 for specifying the upper limit (or cap) on the increase in the number of events, should an adaptive increase be called for based on the target conditional power. Notice that, in keeping with the FDA Guidance on Adaptive Clinical Trials (2010), East does not permit an adaptive decrease in the number of events. Therefore multipliers less than 1 are not accepted in this cell. For example, if you use the multiplier 1.5 and if adaptation takes place, the modified number of events is capped at 501. The 501-event cap becomes effective only if the increased number of events (as calculated by the criteria of cells 4, 5 and 6) exceed 501. 3. Max Subjects if Adapt : This quantity is a multiplier with value ≥ 1 for specifying the upper limit (or cap) on the number of subjects to be enrolled in the study. Although the power of the trial is determined by the number of events and not the number of subjects, the number of subjects play a role in determining how long it will take to observe the required number of events, and hence for determining the study duration. The number of subjects may only be increased, never decreased. Therefore multipliers less than 1 are not accepted in this cell. For example, if you use the multiplier 1.5 and if adaptation takes place, 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1283 <<< Contents 56 * Index >>> Muller and Schafer Method the modified number of subjects is capped at 724 subjects. The trial will continue to enroll subjects until either the required number of events is reached or the cap on the number of subjects is reached. 4. Upper Limit on Study Duration : An event driven trial ordinarily continues until the required number of events arrive. This input parameter is provided merely as a safety factor in order to prevent the trial from being prolonged excessively should the required number of events be very large or their rate of arrival be very slow. Its default value is set at three times the expected study duration obtained from the initial design of the trial. Consequently, if the scenarios being simulated are realistic, the required number of events will almost always be attained much before this upper limit parameter becomes operational. It is recommended to leave this parameter unchanged at least for the initial set of simulation experiments since it would interfere with the operating characteristics of the study if it were to become operational. 5. Target Conditional Power for Re-estimating Events : This parameter ranges between 0 and 1 and is the target conditional power desired at the end of the study. Suppose, for example that the Target CP is set at 0.9. Let the value of the test statistic obtained in the current simulation be zL at look L, where an adaptive increase in the number of events is being considered. Then, by setting the left hand side of equation (54.21) to 0.9 we have: ( 0.9 = 1 − Φ bK r 1+ DL − zL DK − DL r ) p p DL ∗ − δ r(1 − r) DK − DL . DK − DL (56.11) ∗ Upon solving equation (56.11) for DK we obtain the increased number of events 1284 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 that are needed to achieve the target conditional power of 0.9 in this simulation. Let us illustrate with Des 1. In Des 1 K = 2, L = 1, r = 0.5 and the critical value for declaring statistical significance at the end of the trial is b2 = −1.9687, as can be seen by examining the stopping boundaries displayed in the Simulation Parameters tab. The interim analysis is performed when D1 = 167 events are obtained. In the absence of any adaptive change, the trial will terminate when D2 = 334 events are obtained. Suppose the current simulation generates a value z1 = 1.5 for the logrank statistic at look 1. Since the target conditional power is 0.9, equation (56.11) takes the form ( ) r r p 167 167 0.9 = 1−Φ −1.9687 1 + − 1.5 − 0.5δ D2∗ − 167 . 334 − 167 334 − 167 (56.12) In order to evaluate D2∗ , however, it is necessary to specify a value for the log hazard ratio δ in equation (56.12). This parameter is of course unknown. East gives you the option to perform simulations with either the current estimate δ̂1 or to use the value of δ specified under the alternative hypothesis at the design stage. The choice can be made by selecting Estimated HR or Design HR from a drop-down list of the quantity CP Computation Based on of the Sample Size Re-estimation tab. ˆ 1 ) and we The default value is Estimated HR, (or equivalently δ̂1 = ln HR recommend using this default until you have gained some experience with the simulation output and can judge for yourselves which option provides better operating characteristics for your studies. East uses the formula δ̂1 = p z1 r(1 − r)D1 to obtain the current estimate of δ. Upon substituting z1 = 1.5, D1 = 167 and r = 0.5 in the above expression we obtain δ̂1 = 0.232, or equivalently a hazard ratio estimate of exp(0.232) = 1.2611. Substituting the estimate of δ̂1 into equation (56.12) and solving for D2∗ yields D2∗ = 656. Since the maximum number of events has been capped at 501, this simulation will terminate the trial when the number of events reaches 501 instead of going all the way to 656 events. In this case the desired target conditional power of 0.9 will not be met. Indeed in this case the conditional power (with δ̂1 being used in place of the unknown true δ) is only ( r 1 − Φ 1.9687 1+ ) r √ 167 333 − 1.5 − 0.5δ 500 − 167 = 0.798 333 − 167 333 − 167 For a more detailed discussion of conditional power, including the use of a special conditional power calculator that computes conditional power accurately 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1285 <<< Contents 56 * Index >>> Muller and Schafer Method without relying on the approximate assumption that the next look will be the last one, see Chapter 57. 6. Promising Zone Scale : Promising Zone is such that the number of events will only be increased if the conditional power at the interim look falls in this zone. East asks you to select the scale on which the promising zone is to be defined. It can be defined based on the conditional power or the test statistic or the estimated effect size and should be specified by entering the minimum and maximum of these quantities. Let us go ahead with the default option which is Conditional Power. 7. Promising Zone – Min CP : In this cell you specify the minimum conditional power (in the absence of any adaptive change) at which you will entertain an increase in the number of events. That is, you specify the lower limit of the promising zone. 8. Promising Zone – Max CP : In this cell you specify the maximum conditional power (in the absence of any adaptive change) at which you will entertain an increase in the number of events. That is, you specify the upper limit of the promising zone. Suppose, for example, that the number of events is only increased in a promising zone specified by the range 0.45 ≤ CP < 0.8, and suppose that in that case, the number of events is re-estimated so as to achieve a target conditional power of 0.99. Then the Input Parameters Table will contain the entries shown below. The zone to the left of the promising zone (CP < 0.45) is known as the unfavorable zone. The zone to the right of the promising zone (CP ≥ 0.8) is known as the favorable zone. In a group sequential design that includes early stopping boundaries for futility and efficacy, the unfavorable zone contains within it an even more extreme region for early futility stopping and the favorable zone contains within it an even more extreme region for early efficacy 1286 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 stopping. 9. HR Used in CP Computations: In this cell you specify whether the simulations should utilize conditional power based on δ̂L estimated at the time of the interim analysis or should utilize the value of δ specified under the alternative hypothesis, in equations (54.21) and (56.11). The adaptive design will have rather different operating characteristics in each case. The default is to use the estimated value δ̂L . 10. Accrual Rate After Adaptation : East gives you the option to alter the rate of enrollment after an adaptive increase in the number of events. This feature would be useful, for example, to evaluate the extent to which the follow-up time and hence the total study duration can be shortened if the rate of enrollment is increased after the adaptive change is implemented. 11. Estimation Method East gives you the choice of None, RCI, or BWCI methods for parameter estimation. 12. Specify Stage II DesignClicking on this button will bring up the following dialog box. Here you specify the desired choices for Stage-II design. The specification of alpha for the Stage-II design is the most important component for Muller and Schafer method. We will keep the default choice. The other choice is user Specified. Keep all other default choices in this dialog box and click OK. 56.3 Implementation of Parameter Estimation – 56.3.2 BWCI versus RCI 1287 <<< Contents 56 * Index >>> Muller and Schafer Method The dialog box will now look as shown below. Click on Simulate button. Store the simulation results in the library and see the details as shown below. The interpretation of these results is very similar to what was described in CHW chapter section 54.5. Please also see the example for parameter estimation by BWCI and RCI methods given in section 56.3.1. 1288 56.3 Implementation of Parameter Estimation – 56.4.2 BWCI versus RCI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 56.4 Survival Endpoint: Pancreatic Cancer Trial A multi-center, double-blind, placebo-controlled randomized clinical trial is planned for subjects with advanced pancreatic cancer with the goal of comparing the current standard of care (gemcitabine + nap-paclitaxel) to an experimental regimen containing the two standard of care drugs plus a recombinant hyuman enzyme. The primary endpoint is Overall Survival (OS). The study is required to have one-sided α = 0.025, and 90% power to detect an improvement in median survival, from 8.5 months on the control arm to 12.744 months on the experimental arm, which corresponds to a hazard ratio of 0.667. The average enrollment is expected to be 15 subjects/month. We shall first create a two-look group sequential design for this study in East, and shall then show how the design may be improved by permitting an increase in the number of events and sample size at the time of the interim analysis. 56.4.1 Base Design The base design is a two look group sequential design with a Lan and DeMets O’Brien-Fleming LD(OF) efficacy boundary, and a futilty boundary for terminating if the estimated hazard ratio exceeds 1.0. To enter these design parameters into East select the design option for the Logrank Test Given Accrual Duration and Accrual Rates from the Tab on the menu bar as shown below. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.1 Base Design 1289 <<< Contents 56 * Index >>> Muller and Schafer Method Enter the design inputs as shown in the Test Parameters tab. Specify the efficacy and futility boundaries in the Boundary tab. Specify the accrual rate 15/month in the tt Accrual/Dropouts tab, and display the 1290 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.1 Base Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Study Duration vs. Accrual chart by clicking on its icon After examining this chart it is decided to enroll 360 subjects over 24 months resulting in a total study duration of about 34 months. Click the button to compute and store this design temporarily in the Ouput Preview window, and then save the design permanently in the Library by button. Rename the saved design by the name Base. Also rename clicking the the workbook, currently named as Wbk1, by the name Pancreatic and save it on your computer. The library should now look as shown. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.1 Base Design 1291 <<< Contents 56 * Index >>> Muller and Schafer Method You may view a summary of this design by clicking on the 1292 icon. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.1 Base Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Alternatively, you may view this design in greater detail by clicking on the icon. You may also examine the various charts associated with this design by activating them from the icon. For example it is interesting to examine Power versus Treatment Effect chart on the HR scale. Notice that if the actual hazard ratio is 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.1 Base Design 1293 <<< Contents 56 * Index >>> Muller and Schafer Method 0.72 instead of 0.67, then the power deteriorates from 90% to 74.8%. 56.4.2 Simulate without Adaptation Click on the simulation icon which contains four tabs: . You will be taken to the simulation window The Test Parameters tab Do not make any changes to the entries in this tab. 1294 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.2 Simulate without Adaptation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Response Generation tab In order to study the operating characteristics of the adaptive design we will simulate the design with a hazard ratio of 0.72. Therefore please change the value for the hazard ratio from 0.67 to 0.72. The Accrual/Dropouts tab Do not make any changes to these entries. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.2 Simulate without Adaptation1295 <<< Contents 56 * Index >>> Muller and Schafer Method The Simulation Controls tab Change the number of simulations to 100000 for greater Monte Carlo accuracy. Thus far the only change that we have made the original design is to increase the hazard ratio from 0.67 to 0.72 for the simulations. We can simulate the design with the increased hazard ratio by clicking on the button. Notice that the power, based on 100000 simulated trials with HR=0.72, is only 74.47%. Press the Close button, then move the simulated design from the Output Preview window to the Library and save the simulated design in the library. 1296 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.2 Simulate without Adaptation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Rename it as Sim-0.72. You may examine various other operating characteristics by examining the more detailed simulation output that is available by clicking on 56.4.3 Adaptive Simulation There is some uncertainty about the hazard ratio at which to power this study. A hazard ratio of 0.72 is still clinically meaningful. But as we just showed, the power at that hazard ratio is only about 75%. We can recover the lost power by implementing an adaptive increase to the number of events and sample size at the interim analysis time point. We shall do this by simulation using the Müller and Schäfer method. Return to the simulation input window. The easiest way to do this is to click on the Input icon located on the task bar at the bottom of the current window. This action will always open the input window that was most recently used. Alternatively, you can open the input window by selecting Sim-0.72 in the library and clicking on the Edit Simulation icon . Either way you will be taken back to the 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1297 <<< Contents 56 * Index >>> Muller and Schafer Method Simulation Inputs window with the four tabs. At the far right corner of this window is the Include Options button. Click on this button and select Sample Size Re-estimation from the drop-down list. An additional tab labelled Sample Size Re-Estimation is created. Select that tab and choose the Muller and Schafer radio button. You’ll be taken to the Sample Size Re-estimation window. Let us examine this window carefully, for it conveys a large amount of information. It 1298 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is convenient separate this window into two panels; a Left Panel and a Right Panel. The Left Panel, displayed below, is primarily for specifying the criteria that will be used to determine whether or not the current design should be adapted. The Right Panel, displayed below, is for specifying how the original design will be adapted if indeed the criteria for adaptation that have been entered into the left panel are satisfied. At present only the original design (i.e., the Base design saved in the library) is shown in the Right Panel. For simulation purposes it is referred to as the Stage I design. If, in any simulation round, the adaptation criteria specified in the Left Panel are not met, only the Stage I design that will be simulated. Now we have stipulated on the top line of the Left Panel that the Stage I design would be adapted at look 1. Therefore, if the the adaptation criteria specified in the Left Panel are met, then the remainder of the Stage I design beyond look 1 will be adapted in accordance with specifications that will be provided through the creation of a Stage II design. We shall explain how the Stage II design is created shortly. Left Panel: Specification of Criteria for Adaptation We first we enter the inputs into the Left Panel. At the top of the Left Panel we specify when the adaptation will take place. This may either be specified in terms of the Look # or Information Fraction For this example, we will make the adaptation after completing Look 1. We next specify the maximum allowable number of events and the maximum allowable sample size should the trial be adapted. This is achieved by specifying an appropriate multiplier in the Max. # of Events and Max. Sample Size edit boxes. We will use a multiplier of 1.5 for both the events and the sample size. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1299 <<< Contents 56 * Index >>> Muller and Schafer Method By this specification we have placed a cap on the magnitude of the increase in events and sample size. For example, if in any simulation round the decision rule used to re-estimate events produces the value 400, the re-estimated number of events will be nevertheless be truncated to 390. The next specification is the Upper Limit on Study Duration. By default it is three times the maximum study duration of the Stage I design. It is provided as a precaution against excessive prolongation of a simulated trial in case of very slow arrival of events and its default value is typically not altered. The Stage I design displayed on the Right Panel shows a maximum study duration of 32.925 months. Therefore the Upper Limit on Study Duration is 3 × 32.925 = 98.776. The next three entries describe the criteria for trial adaptation. The adaptation criteria in East are based on the promising zone design proposed by Mehta and Pocock (2011). The interim analysis results are partitioned into three zones; Unfavorable, Promising and Favorable. If the interim results fall in the unfavorable or favorable zones, there is no adaptation. But if they fall in the promising zone, the trial is adapted. In the Müller and Schäfer method permits the permissible adaptations go beyond mere sample size re-estimation. One may, in addition increase the number and spacing of the future looks and also alter the spending function. The partitioning of the interim sample space into zones can be based on three different scales – Conditional Power, Test Statistic or Estimated HR. One 1300 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 can pick the desired scale from a drop-down list as shown below. The three scales are in one-to-one correspondence, so that the selection of the scale is simply a matter of choosing the one that is easiest to interpret in a given situation. In the current example the Conditonal Power scale has been chosen. Accordingly the promising zone is defined as the region of the interim analysis sample space in which the conditional power is between 03 and 0.9. To see this same zone on the hazard ratio scale, choose Estimated Hazard form the drop-down choice of scales for the promising zone. It is seen that on the estimated HR scale the same promising zone corresponds to the interim estimate of the hazard ratio lying between 0.8202 and 0.7001. On the test statistic (or Wald statistic, or Z-statistic) scale the promising zone corresponds to −2.0328 ≤ Z ≤ −1.1298. If the promising zone is defined in terms of conditional power, one needs to specify what hazard ratio will be assumed for computing conditional power. The conditional power calculations may be performed either with the interim estimate of hazard ratio or with the value of hazard ratio that was used to create the base design (i.e., 0.667). The default choice is Estimated HR. We will use the default specification. The next entry is used to specify the rate of accrual after the adaptation. We shall 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1301 <<< Contents 56 * Index >>> Muller and Schafer Method assume that there will be no change in the accrual rate. The final entry in the Left Panel is a specification of the method for estimating the hazard ratio at the final analysis that adjusts for the fact that an adaptive group sequential design was used. There are three choices; none, RCI method and BWCI method. The purpose of this input is to verify by simulation the properties of the RCI method (Mehta, Bauer, Posh, Brannath, Statistics in Medicine, 2007) and the BWCI method (Gao, Liu, Mehta, Statistics in Medicine, 2013) for computing point estimates and confidence intervals that adjust for having used an adaptive group sequential design. We shall use the None option for the present since the RCI and BWCI options are intended as tools for methodological research rather than for the actual design of a trial. Right Panel: Specification of the Stage II Design Next we consider the Right Panel. At present it displays the Stage I or Base design in summary form. The Stage I design has 2 looks and we have indicated that we will cbe altered if the adaptation criterion of being in the promising zone is met. We must, however, specify to East prescisely how the remainder of the trial beyond look 1 will be adapted if the adaptation criterion is satisfied. As we have explained in Section 56.1.1, although we are dealing with a single trial that is adapted at an interim analysis, it is more convenient to specify the portion of this trial that is implemented after the adaptation as a separate Stage II trial having a type-1 error equal to the conditional type-1 error of the Stage I trial obtained at the time of the adaptation. This is the essense of the Müller and Schäfer method of adapting an on-going study while preserving the overall type-1 error. Accordingly, click on the button. You are now taken to the following dialog box where you must specify the 1302 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design parameters of the Stage II design. You must specify the Stage II design in this dialog box. The complete design specification consists of type-1 error, power, number of looks, hazard ratio and efficacy/futility boundaries. East will then compute the number of events that are needed to attain the specified power. Because this dialog box is used for simulation only power and number of looks are specified explicitly. All other quantities depend on the data that are obtained from the Stage I trial at the time of the adaptive look, and therefore vary from simulation to simulation. Let us illustrate by simulating an adaptive trial in which we will adapt at look 1 of the Stage I design. The adaptation will consist of an increase in the number of events and sample size, and one additional interim look, resulting in a two-look Stage II design with a Pocock error spending function for the efficacy boundary and a hazard ratio of 1 for the futility boundary. We enter the appropriate inputs as follows: Specify that the Stage II design has two looks. This specification causes a Boundary tab to appear. Enter the Pocock efficacy boundary and the HR=1 futility boundary in the 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1303 <<< Contents 56 * Index >>> Muller and Schafer Method Boundary tab as shown. The actual amount of α available for spending and the actual efficacy boundary cannot be displayed. These design parameters depend on the conditional type-1 error of the Stage I trial at the time of the adaptation and will therefore vary from simulation to simulation in accordance with the Müller and Schäfer method. Return to the Test Parameters tab. This input dialog box requires the following specifications: 1. Specification of alpha: There are two choices. For a Müller and Schäfer design the correct choice is Cond.Type 1 Error From Stage-I. Only this choice will ensure that the overall type-1 error of the adaptive design is preserved. The alternative choice, User Specified, has been included simply for illustrative purposes, to demonstrated that if a fixed type-1 error is specified in an adaptive trial, it would not be preserved. Thus select Cond. Type 1 Error from Stage -I and note that it will vary from simulation to simulation depending on the value of the test statistic obtained in the Stage I trial at look 1. 2. Specification of HR: Here too there are two choices In this case the choice depends on the user’s preference. If Estimated from Stage-I is selected the sample size will be computed by 1304 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 assuming that the hazard ratio that was obtained at Stage I at the time of the adaptive interim look is the true hazard ratio. Therefore it will vary from simulation to simulation. Alternatively one might desire to simulate the adaptive design with a fixed hazard ratio, say the HR that was specified for the original design. Here we will select the Estimated from Stage-I option thereby letting the data from the first stage determine the value of HR from simulation to simulaion. 3. Power. This is the desired power for the Stage II trial. However this power may not be attainable in every simulated trial. Depending on the type-1 error and hazard ratio that have been estimated from Stage I, East will compute the number of events, say Dr , that are required to attain the desired power in the Stage II trial. Now you have already specified in the Left Panel of the Sample Size Re-estimation tab the maximum number of events if the trial is adapted – in this case 390 events: Therefore if, in any simulation, Dr > 390, East will only only generate 390 events, and the desired power will not be attained. More generally let Dmax denote the maximum allowable number of events specified in the Sample Size Re-estimation tab. Then Da , the actual number of events that East will generate in any simulation, is given by Da = min(Dr , Dmax ) Let Nmax denote the maximum sample size if the trial is adapted, in this case 540. East will generate patient arrivals until either the Da events have arrived or Nmax subjects have arrived. In the latter case East will follow the Nmax subjects until Da events have arrived. For the current simulation experiment enter the value 0.999 into the edit box for power. By selecting such a high value for power we are assured that in every simulated trial the required number of events, Dr , will hit the cap Dmax = 390; that is, Da = Dmax = 390 in every simulation. Thus this choice for power is an implicit way of specifying that there will be a one-time 50% increase in the number of events if the Stage I results fall in the promising zone at the time of the interim analysis. The inputs in the Test Parameters tab for the specification of the Stage II design now look as follows. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1305 <<< Contents 56 * Index >>> Muller and Schafer Method To complete the specification, press the OK button. East will return you to the Sample Size Re-estimation tab and will display both the Stage I and Stage II designs side by side in the Right Panel. Displaying Stage I and Stage II as Single Integrated Design Although we are dealing with a single integrated design we have regarded the remainder of the trial after the adaptive look at Stage I as a separate Stage II trial whose type-1 error is equal to the conditional type-1 error of the Stage I design. This is extremely convenient for 1306 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 design purposes because we can use all the functionality that already exists in East to design a separate Stage II trial without considering how it will be integrated with the existing Stage I design. On the other hand, this artificial separation of a single design into two separate designs makes it difficult to visualize the stopping boundaries of the integrated design or the trajectory traced by the test statistic during the interim monitoring phase of the study. Thus, although it is technically correct to monitor the Stage II design independently, with the test statistic starting out at the value zero, it is not very intuitive to do so. To gain a better understanding of the how the Stage I and Stage II designs are integrated in to a single design we have provided the CP+- button to the right of Cond. Power. Clicking on this button will open up a conditional power calculator. By default, the calculator will open with the radio button for the Stage I design selected and the radio button for specifying that the HR to be Used in Conditional Power Computation will be estimated from the data rather than specified separately by the user. With this choice the observed value of the test statistic Z and the estimate of HR, say ˆ at the time of the adaptive look in the Stage I trial are in one to one HR, correspondence through the relationship due to Schoenfeld (Biometrika, 1981) p Z = ln(HR) Dr(1 − r) where D is the number of events at the time of the adaptive look (here D = 130) and r is the randomization fraction for allocating subjects to the two treatment arm (here ˆ or Z in the appropriate edit box and the calculator r = 0.5). We can specify either HR will output the corresponding value of conditional power. If the conditional power falls in the promising zone, the trial will be adapted through the creation of a Stage II design. Otherwise the trial will continue as planned. Below we provide three examples to show how different values of Z result in different Stage II designs and how the two Stages may be viewed as a single integrated adaptive design. Example 1: Z = −1.5187 In the view below, if Z = −1.5187, then HR = 0.7661 (by Schoenfeld’s formula above) and the conditional power is computed as 0.6. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1307 <<< Contents 56 * Index >>> Muller and Schafer Method This would imply that the interim result has fallen in the promising zone (CP between 0.3 and 0.9) and hence the remainder of the Stage I trial should be adapted. To obtain the conditional type-1 error, (also referrred to as the Conditional Error Rate (CER) or the conditional Rejection Probability (CRP)) of the remainder of the Stage I trial, select the Arbitrary HR radio button. Now HR and Z are no longer in one to one correspondence so that we can set Value of HR to 1 and separately set Value of 1308 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 z to -1.5187 as shown below. The calculator reveals that the conditional type-1 error at HR = 1 and Z = −1.5187 is 0.1029. Since HR = 1 corresponds to the null hypothesis, 0.1029 is the conditional type-1 error of the Stage I trial if Z=-1.5187 is observed at look1. Thus the amount of α we would use for the Stage II design is 0.1029. Now set the radio button for HR back to Estimated HR, z and set Z = −1.5187. Once again, Z and HR are in one to one correspondence via Schoenfeld’s formula so that HR = 0.7661. Now 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1309 <<< Contents 56 * Index >>> Muller and Schafer Method select the Stage II Design radio button. The following output is obtained: Conditional Type I Error = 0.1029. This is the amount of α or conditional error rate that will be available for the Stage II design CP(Stage I) = 0.6. This is the conditional power of the Stage I design if the observed value at look 1 is Z = −1.5187 and HR = 0.7661. This puts the look 1 result in the promising zone so that the trial may be adapted CP(Stage II) = 0.7639; Events(Stage II) = 260; Events(Integrated) = 390. These outputs show that the interim analysis of the Stage I trial is in the promising zone and therefore the Stage II design is invoked. Although the Stage II design is intended to achieve 99% power at the estimated value of HR = 0.7661, it 1310 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 cannot do so because of the cap of 390 events on the integrated design (or 260 events on the Stage II design). Because the number of events cannot be increased further, the power of the Stage II design is 0.7639 and not 0.999. If we click on the Details button we get more insight into the Stage II and integrated designs. Each of the charts in the right panel can be magnified by clicking in its icon. The top panel shows the integrated design in which the Stage I design was adapted after 130 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1311 <<< Contents 56 * Index >>> Muller and Schafer Method events were observed. The green dot is the observed value of the test statistic at look 1, Z = −1.519. It falls in the promising zone. The lower panel shows the two-look Stage II design with the Pocock efficacy boundary and the HR=1 futility boundary. The type-1 error of the Stage II design is 0.1029, which corresponds to the conditional type-1 error from Stage 1312 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 I. It is instructive to view the adaptation rule and its impact on conditional power graphically. This can be achieved by switching from Boundary Plots to 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1313 <<< Contents 56 * Index >>> Muller and Schafer Method Promising Zone Plots from the drop-down list as shown below. These are the familar Promising Zone Plots that have been well documented in Chapter 54. The top plot displays the promising zone (CP between 0.3 and 0.9) on the X-axis and the number of events for the integrated design on the Y-axis. Outside the 1314 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 promising zone the number of events is 260, rising to 390 in the promising zone. The X-axis of the bottom plot is the same as for the top plot and shows, the conditional power based on the current value of the test statistic and the current estimate of the hazard ratio. The Y-axis shows the conditional power if the number of events are increased in accordance with the rule implied by the top plot, and under the hazard 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1315 <<< Contents 56 * Index >>> Muller and Schafer Method ratio specified in the Reference HR edit box. Example 2: Z = −1.1 For additional insight enter the value Z=-1.1 into the CP calculator and press the Recalc button. This time the result from the Stage I design is not in the promising zone. Therefore the integrated design and the Stage I design are identical with a total of 260 events. The Stage II design is simply the continuation of the Stage I design for an additional 130 events and has the same final critical value of 1316 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 -1.9686. Example 3: Z = −1.9 Finally, enter the value Z = −1.9 (corresponding to HR = 0.7166) into the calculator. Now CP = 0.8452 which is in the promising zone. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1317 <<< Contents 56 * Index >>> Muller and Schafer Method Thus the trial is adapted. The conditional type-1 error from stage I that is utilized in the Stage II design is 0.1883. Unlike Example 1, the Stage II design achieves the full 90% power with 260 events. The pre-specified cap of 360 events for the integrated design (or 260 events for the stage II design) was not exceeded in the computation of events required to obtain 90% power for the Stage II design. To further your understanding of the adaptive design you might find it helpful to enter additional values of Z or HR into the conditional power calculator and view the resulting numerical and graphical outputs. When you are done with exploring the properties of the integrated and Stage II design in this manner, press the Close button to return to the Sample Size Re-estimation tab of the simulation inputs window. 1318 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Simulation Results Now that you have completed the specification of the adaptive design through the Stage I and Stage II specification in the Sample Size Re-estimation tab, let us evaluate its operating characteristics by simulation. The design is shown below. This two-stage design will be simulated 100,000 times. In each simulation, look 1 of the Stage I design will be taken after 130 events and if the resulting conditional power based on the estimated hazard ratio lies in the promising zone (conditional power of the Stage I design between 30 To simulate this design press the Simulate button at the bottom right of the screen. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1319 <<< Contents 56 * Index >>> Muller and Schafer Method East will generate 100000 simulated trials with a hazard ratio of 0.72, the value that we entered in the Response Generation tab. The simulation results may be viewed in the temporary tables shown below for the Integrated Trial and the Stage II Trial Press the Close to move the simulation results to the Output Preview. 1320 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then press the button to move the simulation results to the library where you can view the detailed output. Name the saved library node as MSSim-2look-PK. Open MSSim-2look-PK with the tool and examine the output. Notice that the overall power is 79.57% while in the promising zone is 88.86%. The cost in terms of average study duration is 30.304 months for all trials and 30.292 months in the promising zone. The average sample size in the promising zone, however, is 478 compared to 388 for all trials. It would be interesting to compare this performance with that of the Base design when the true hazard ratio is 0.72. To make this comparison, change the multipliers for 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1321 <<< Contents 56 * Index >>> Muller and Schafer Method sample size and events to 1 in the Sample Size Re-estimation tab Then specify that the Stage II design will be a single-look design with 90% power subject to the cap on events and sample size in the Sample Size 1322 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Re-estimation tab. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1323 <<< Contents 56 * Index >>> Muller and Schafer Method Now press the Simulate button. Save the new design in the library with the name MSSim-1look-noSSR Examine the simulation details of MSSim-1look-noSSR Notice that the power in the Promising Zone is only 74.71%. And the average cost in terms of study duration 32.139 in the promising zone and 30.001 months for all trials, about the same as for MSSim-2look-PK. On the other hand the average sample size in the promising zone is only 360 subjects for MSSim-1look-noSSR, compared to 478 subjects for MSSim-2look-PK. This is the cost associated with increasing the power from 74.58% to 88.86% and it is only incurred if the interim results are promising. It would be interesting to compare the adaptive design obtained by the Müller and Schäfer method with the adaptive design obtained by the CHW method. The main limitation of the CHW method is that the only adaptation permitted is an increase in events and sample size. There is no flexibility to alter the number or spacing of the future looks after the adaptation. To run the CHW design make the changes shown below in the Sample Size Re-estimation tab. (Notice that that Target CP for Re-estimating # of Events is set to 0.999 so as to ensure a one-time 1324 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 increase in number of events by 50%, as was used for the MSSim-2look-PK design.) Simulate this design and save it in the library with the name CHWSim-1look. The CHWSim-1look design has about 1% more power than the MSSim-2look-PK design in the promising zone. On the other hand it has an average study duration that is 4.6 months longer in the promising zone. The Müller and Schäfer design has paid a price in terms of a 1% loss of power and in turn has benefitted by a shorter study duration due to the potential for early stopping in the Stage II design. We might 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.3 Adaptive Simulation 1325 <<< Contents 56 * Index >>> Muller and Schafer Method therefore expect that if we were to construct a Müller and Schäfer design with only one look for the Stage II portion, it would have similar operating characteristics to the CHW design. To verify this conjecture create such a design by using the following inputs. Simulate the design and save it in the library with the name MSSim-1look. As we anticipated, the CHWSim-1look, the MSSim-1ook designs have similar operating characteristics; about the same power and same study duration in all zones. This confirms the claim by Mehta and Liu (2016) that for the special case of a single future look following an interim analysis the CHW and Müller and Schäfer methods are equivalent. These examples show that the Müller and Schäfer method has greater flexibility for trading off power versus study duration in an adaptive setting that the CHW method; it permits more complex adaptations for the Stage II design, without sacrificing power or study duration in the special case of a single-look adaptation. 1326 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 56.4.4 Interim Monitoring We will now use East to monitor the Base design. With cursor on Base in the library window, click on the interim monitoring icon . You will be taken to the interim monitoring worksheet. The First Interim Look In order to populate this worksheet with interim data you must click on the button on the tool bar at the top of the worksheet. Thereby a form, titled Test Statistic Calculator, appears and you are requested to enter the interim number of events, the interim estimate of δ and its standard error into this form. You now have two options. 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1327 <<< Contents 56 * Index >>> Muller and Schafer Method Option 1 Enter the requested quantities directly into the Test Statistic Calculator for the current look and click the OK button at the bottom. East will then perform the necessary calculations and post the interim results in the worksheet as shown below. Option 2 If the actual patient level data are saved as a file in one of the acceptable file formats, you can import the file into East through the File > Import. In this example there is a file titled Pancreatic-Look1.csv in the sub-folder Samples in your East installation folder containing the data required for the interim analysis at look 1. When you select this file with File > Import >∼Pancreatic-Look1.csv you will be asked to select the appropriate 1328 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Delimiter from the Import File Format Option dialog box. Choose the default Comma delimiter and attach the file to the Pancreatic workbook. East will then display the data in its Data Editor. This file must contain, at a minimum, the above six variables, Subject ID, ArrivalTime, TreatmentID, TimeOnStudy, CensorInd, Status, although it may contain many other variables as well. The names of these variables may differ in your data set from the ones given in this example, but they must carry the same meaning above. The variable names used in this example are mostly self-explanatory. However there is a distinction between CensorInd and Status. CensorInd assumes the value 1 if the event (in this case death) has occured, and assumes the value 0 if the observation is administratively censored while the subject is still in follow-up. Status assumes the value 1 if the event (in this case death) has occured, assumes the value 0 if the observation is administratively censored, and assumes the value -1 if the subject has dropped out of the study. In this example CensorInd and Status are the same, because there are no drop-outs. Before East can populate the interim analysis worksheet it is necessary to create an Analysis Node from this data set. Accordingly select the 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1329 <<< Contents 56 * Index >>> Muller and Schafer Method Analysis>Two Sample>Logrank options from the top-level menu and complete the entries in the ensuing form as shown. Upon clicking the OK button East will create the following Analysis of Time to Event Response node. Four tables are created. The Summary of Observed Data table shows 1330 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 that 130 events have been observed from 263 enrolled subjects. The Parameter Estimates from Cox Model table displays the current estimate of hazard ratio (HR=0.7579), and other related output from the Cox model including the estimate of δ (-0.2772), its standard error (0.1761), and the corresponding Wald statistic (-1.5741). This information can now be utilized to populate the interim analysis worksheet. Select the interim monitoring node from the library and click the edit button from the library menu to retrieve the worksheet. If any row of the worksheet is already populated clear away the entries by selecting that row and clicking on the Delete Look button Now click on the button and select the Read from 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1331 <<< Contents 56 * Index >>> Muller and Schafer Method Analysis Node radio button. Make sure that the appropriate Workbook and Analysis Node are selected 1332 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 in this dialog box. Then click the Recalc button. Upon clicking OK Look 1 of the Interim Analysis Worksheet gets populated with the results of the first interim analysis. We observe that the test statistic, -1.5741 has not crossed the efficacy boundary, -2.9626. However, the conditional power, 0.6421, is in the promising zone. Thus an adaptive increase in number of events and sample size is indicated. This can be confirmed by simulation as we show next. The Predictive Interval Plot (Note: The Predictive Interval Plots (PIPs) are introduced and fully described with examples in Chapter 65.) Clicking on the button on the menu bar will enable you to simulate the future course of the trial conditional on the current data. You will be requested in fill in the names of required 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1333 <<< Contents 56 * Index >>> Muller and Schafer Method variables from the Pancreatic-Look1 data set. Select the appropriate variables from the respective drop-down menus as shown. button so that East can estimate the Click on the individual hazards and hazard ratio and can input the sample size and number of events 1334 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 from the Pancreatic-Look1 data set. If you now click on the Simulate button you will simulate the future course of the trial from the current data in Pancreatic-Look1 and obtain 1000 repeated confidence intervals (RCIs) each representing a possible final analysis for the trial. These RCIs are sorted and stacked on top of one another to provide an intuitive plot called a Predicted Interval Plot (see, for example, Li, Evans, Uno and Wei, Statistics in 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1335 <<< Contents 56 * Index >>> Muller and Schafer Method Biopharmaceutical Research, 2009). The black dot at the center of each RCI is the estimate of hazard ratio for that simulation. The X-axis displays a range of possible hazard ratios with a vertical cursor positioned by default at HR=1. The vertical cursor can be dragged to the left or right or be moved to a specific location by entering a value in the Treatment Effect edit box at the top right of the window. It is seen that 63.8% of these RCI’s have their upper bounds to the left of HR=1, thereby demonstrating the estimated conditional power of 64.2%. The color coded vertical bar on the right of the graph is a heat plot representing the distribution of the 1000 hazard ratios. Each color represents 5% of the observed hazard ratios. For example, the lowest 5% of hazard ratios have values less than or equal to 0.658, the lowest 25% of hazard ratios have values less than 0.716, and so on. The PIP plot is more infomative than a conditional power calculation. To see this let us suppose that only hazard ratios that are smaller than 0.72 are considered to be clinically meaningful. If move the vertical cursor to 0.72, we find that only 1.5% of the 1000 simulated future 1336 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 trial have a clinically meaningful hazard ratio. To save this PIP plot in the library for future use click on the Save in Workbook icon at the top of the window A snapshot of the current entries in the interim monitoring worksheet is saved in the library along with the PIP plot. One can examine the contents of these newly created nodes by double-clicking or by selecting and the clicking on the Details tool in the library tool bar. Adaptive Increase in Events and Number of Looks Since the conditional power (64%) is in the promising zone we decide to make an adaptive change. In keeping with the simulations that were performed at the design stage, let us alter the future course of the trial in two ways: 1. Increase the total number of events by 50%. Thus the total number of events will be increased from the 260 to 390. 2. Increase the number for future looks from 1 to 2 and alter the efficacy spending function from LD(OF) to LD(PK). In order to make these changes click on the Adapt button on the tool bar at the top of 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1337 <<< Contents 56 * Index >>> Muller and Schafer Method the interim monitoring worksheet. Change the Number of Looks to 2 and the Incremental Number of Events to 260 as shown. (Note: to change the events rather than the power, you will 1338 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 have to switch the selection of the radio button.). Press the Recalc button when done. Next go to the Boundary tab and change the error spending function for efficacy from LD(OF) to LD(PK). Also change the HR for early stopping for futility to 1.0. Press Recalc when done. Finally examine the Accrual/Dropouts tab and leave these entries unchanged and 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1339 <<< Contents 56 * Index >>> Muller and Schafer Method click OK. With these adaptations, the study will enroll an additional 209 subjects for a total of 472 subjects, and will follow them until 390 total events are obtained. Two additional interim looks are planned, one at 260 events and one at 390, with a Pocock spending function efficacy boundary and a HR=1 futility boundary. The interim monitoring worksheet has been modified to reflect these changes. Below is a screenshot of the 1340 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 integrated 3-look adaptive design. The Second Interim Look Import the data set for the second interim analysis into East with the File>Import>∼Pancreatic-Look2.csv commands. Create an analysis node for the Pancreatic-Look2 data set in the same manner as 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1341 <<< Contents 56 * Index >>> Muller and Schafer Method was done for Pancreatic-Look1. Return to the interim monitoring worksheet by clicking on followed by in the library. Select Row 1 of the interim monitoring worksheet. Click on the PIP button and complete the entries as shown below to simulate the future course of this adaptive trial. As before, you will select the Pancreatic-Look1.cydx data set from the Select Subject Data drop 1342 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 down box. Click Simulate and obtain 1000 one-sided repeated confidence intervals adjusted for the adaptive design by the published method of Mehta, Bauer, Posch and Brannath 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1343 <<< Contents 56 * Index >>> Muller and Schafer Method (2008). With the adaptive change of total number of events, the conditional power has improved considerably and is now 86.0%. Return to the interim monitoring worksheet once more and select Row 2. Read the Pancreatic-Look2 data into East by clicking on and then selecting Pancreatic-Look2.cydx for the Select Analysis 1344 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Node drop down box. Click on the Recalc button to complete the entries in the Test Statistic 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1345 <<< Contents 56 * Index >>> Muller and Schafer Method Calculator. Finally, click on the OK button. East tells us that the efficacy boundary has been crossed. Click on the Stop button to complete the trial. The final inference is displayed in the 1346 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following table. Statistical significance has been achieved. The stage wise adjust p-value after accounting for the adaptation is 0.0043. The 95% confidence interval for hazard ratio is (0.5598, 0.9168) and the point estimate is 0.7153. Examine the various charts on the interim monitoring worksheet. Chart: Stopping Boundaries (Integrated Design) 56.4 Survival Endpoint: Pancreatic Cancer Trial – 56.4.4 Interim Monitoring 1347 <<< Contents 56 * Index >>> Muller and Schafer Method Chart: Confidence Intervals for HR 1348 56.4 Survival Endpoint: Pancreatic Cancer Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Chart: Error Spending Function (Stage II Design 56.4 Survival Endpoint: Pancreatic Cancer Trial 1349 <<< Contents * Index >>> 57 Conditional Power for Decision Making In the course of conducting an adaptive clinical trial, many decisions that are to be made on matters such as determining sample size, stopping the trial for futility, and whether or not and when and how to adapt the trial design, depend primarily on the values of ‘power estimates’. East provides facilities to compute or use different types of ‘power estimates’ while designing, simulating, or monitoring a trial. This chapter describes the special conditional power calculators that EastAdapt and EastSurvAdapt have provided for computing conditional power either in the interim monitoring worksheets or in the simulation worksheets. Informally, conditional power is the probability, given the current data, that the trial will ultimately achieve statistical significance. For a more formal definition refer to Chapter 54, Section 54.1.3. Conditional power calculations depend on assumptions that you make about the unknown parameters δ and σ. The conditional power calculators in East accept as inputs either user-specified values of δ and σ, or estimates of δ and σ obtained at the interim analysis. This will be illustrated through several examples in this chapter. This chapter is arranged into the following sections: CP Calculator-CHW:Interim Monitoring – Normal Endpoint – Binomial Endpoint – Time to Event Endpoint CP Calculator-CHW:Simulation – Normal Endpoint – Binomial Endpoint – Time to Event Endpoint 57.1 CP Calculator CHW: Interim Monitoring 57.1.1 Normal Endpoint 57.1.2 Binomial Endpoint 57.1.3 Time to Event Endpoint 1350 This section explains the use of conditional power calculator while performing Interim Monitoring. 57.1.1 Normal Endpoint Consider a two-arm trial to determine if there is an efficacy gain for an experimental drug relative to the industry standard treatment for negative symptoms schizophrenia. The primary endpoint is the improvement from baseline to week 26 in the Negative Symptoms Assessment (NSA), a 16-item clinician-rated instrument for measuring the negative symptomatology of schizophrenia. The trial is designed for one-sided 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 alternative hypothesis that δ > 0. It is expected, from limited data on related studies, that the difference of mean is expected to be 10 with a standard deviation of 50. Create a design worksheet as shown below. We will now monitor the trial. Invoke the CHW interim monitoring by clicking icon which will appear as displayed below. Click on the icon and in the ensuing Test Statistic Calculator, enter sample size as 215, δ as 8, and SE as 7.2291. Click OK. The incremental test 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint 1351 <<< Contents 57 * Index >>> Conditional Power for Decision Making ˆ = 8/7.2291 = 1.1066). statistic, will be computed as 1.1066 (δ̂/SE Similarly, for the second look, click on the icon and then enter the estimates of the incremental accrual, δ and SE as 220, 9.2 and 7.4162 respectively in the Test Statistic Calculator. Click OK. The computed values will be posted in the interim monitoring sheet as shown below. After any interim look, based on the observed values of δ and σ, you will be able to use conditional power calculator to estimate either conditional power or sample size using appropriate inputs as shown in Table 57.1. Let us examine the use of conditional power calculator with a few examples. From the interim monitoring sheet, click on the Conditional Power Calculator icon 1352 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 57.1: Conditional Power Calculator Use Estimate Input conditional power observed δ/σ design sample size observed δ/σ user specified sample size user specified δ/σ design sample size user specified δ/σ user specified sample size observed δ/σ desired conditional power user specified δ/σ desired conditional power conditional power conditional power conditional power sample size sample size to invoke the conditional power calculator as displayed below. 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint 1353 <<< Contents 57 * Index >>> Conditional Power for Decision Making The calculator is divided into two parts. The first part is the input part. The values for the cells in this part are automatically filled using the interim monitoring sheet values. The calculator indicates that the second interim look has been taken, the cumulative sample size is 435 and the weighted z statistic after the second interim look is 1.66. The second part is the input/output part. Here you may decide to estimate either conditional power or sample size by clicking on the appropriate radio button, and then specifying the required input as detailed in Table 57.1. By default, the calculator is showing the value of δ/σ as 0.2, which is the estimate obtained from the incremental data of the second look. The interpretation is that if the hypothesized value of δ/σ is 0.2, then the conditional power to reach significance at any future look with a maximum sample size of 1075 is 0.905. Computing Conditional Power for Specified Sample Size Now, suppose, you estimate, using cumulated data, that δ/σ is likely to be 0.1593, then enter this value in the calculator and click on Recalc button. The calculator will display the new estimate for conditional power, which is 0.789. If you want to enter another set of estimates, say, δ = 7.5 and σ = 45.0, and want to enter each value separately, you can do that first by clicking on the top check box and then entering the values. Then click on Recalc button to get the new estimate of 1354 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 conditional power as 0.815. To view a plot of conditional power vs. delta, click on the Plot button 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint 1355 <<< Contents 57 * Index >>> Conditional Power for Decision Making and a plot will appear as shown below. In the above plot, if you click on the radio button against sample size, you will get the conditional power vs. sample size plot displayed below. 1356 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Computing Sample Size for Desired Conditional Power With the values of δ = 7.5, σ = 45.0, and samplesize = 1075, we obtained the conditional power estimate as 0.815. Now, keeping the same values for δ and σ, if you want to estimate the sample size for a desired conditional power of 0.90, you can proceed like this: Click on the radio button against sample size input box, enter the value of 0.90 for conditional power and then click on Recalculate button. You will get the estimate of sample size to be 1334 as displayed in the screen shot below. With the above setting in the conditional power calculator, you can click on the Plot button to get the sample size vs. delta and sample size vs. 57.1 CP Calculator-CHW-IM – 57.1.1 Normal Endpoint 1357 <<< Contents 57 * Index >>> Conditional Power for Decision Making conditional power plots as displayed below. 57.1.2 Binomial Endpoint Consider a two-arm, placebo controlled randomized clinical trial for subjects with acute cardiovascular disease undergoing percutaneous coronary intervention (PCI). 1358 57.1 CP Calculator-CHW-IM – 57.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The primary endpoint is a composite of death, myocardial infarction or ischemia-driven revascularization during the first 48 hours after randomization. We assume on the basis of prior knowledge that the event rate for the placebo arm is 8.7%. The investigational drug is expected to reduce the event rate by at least 20%. The investigators are planning to randomize a total of 8000 subjects in equal proportions to the two arms of the study. Let us design with help of East that a group sequential 3 looks design to detect a 20% risk reduction with a one-sided level-0.025 test of significance (with 0.087 on the control arm and 0.8 × 0.087 = 0696 on the treatment arm). It is also decided that two interim looks, one after 4000 subjects are enrolled (50% of total information) and the second after 5600 subjects are enrolled (70% of total information) will be taken. Early stopping efficacy boundaries are derived from the Lan and DeMets (1983) O’Brien-Fleming type error spending function. With the above specifications, create a plan in East as shown below. We will now monitor the trial. Select icon to invoke the CHW interim monitoring sheet. You will then be taken to the interim monitoring worksheet 57.1 CP Calculator-CHW-IM – 57.1.2 Binomial Endpoint 1359 <<< Contents 57 * Index >>> Conditional Power for Decision Making displayed below. The first interim look was taken after accruing 4000 patients, 2000 per treatment arm. We input this number in the Incremental accrual number in the row corresponding to the first look. To calculate the incremental statistic, we utilize the test statistic calculator. There are 174 events in the control arm and 147 events in the treatment arm. Based on these data the estimate of δ is (147/2000) − (174/2000) = −0.0135 and the estimate of SE = 0.0086. So the value of the test statistic is SE/estimate of δ = −1.5718. These values are entered in the test statistic calculator as shown below. 1360 57.1 CP Calculator-CHW-IM – 57.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on OK and the values of incremental accrual and incremental test statistic will appear in the IM sheet. Conditional Power Calculator After each look during interim monitoring the decision to alter the sample size can be made using the conditional power calculator. After any interim look, based on the observed data, you will be able to use conditional power calculator to estimate any one of the three quantities - conditional power or sample size or πt -given the estimates of other two and any specified value of πc . Computing Power for a pre specified sample size Click on the icon IM toolbar to invoke the conditional power calculator as shown below. from the The calculator is divided into 2 parts. The first part displays the inputs that are used in the interim monitoring sheet till the current look. The Cumulative accrual is 4000 and the current weighted test statistic is −1.5718. 57.1 CP Calculator-CHW-IM – 57.1.2 Binomial Endpoint 1361 <<< Contents 57 * Index >>> Conditional Power for Decision Making The second part helps the user to estimate the value of a desired parameter by selecting the radio button against the parameter and then entering the values for other parameters and clicking on Recalc button. In the current scenario, for a final overall size of 8000, and the hypothesized values of πc = 0.087 and πt = 0.0735, the conditional power is estimated to be 0.629. Now keeping the values of πc , and πt same, if you want to estimate the conditional power for an increased sample size of 10,000, then enter this value and click on Recalc button to see the conditional power estimate to be 0.752. Now you may click on the Plot button and choose x-axis to represent sample size, to see the graph of conditional power vs. sample size, assuming πt = 0.0735, 1362 57.1 CP Calculator-CHW-IM – 57.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 πc = 0.087. Re-estimating Sample Size for a desired power If a final overall sample size is to be estimated for a desired value of conditional power, the user can do so by selecting the sample size radio button in the calculator. Suppose, the user wants to estimate the increase required in the final sample size for a desired conditional power of 80%. The user can select the radio button next to the Final Sample Size input box and enter the value of 0.8 for conditional power and then click on Recalc button. The result in the conditional power calculator will appear as 57.1 CP Calculator-CHW-IM – 57.1.2 Binomial Endpoint 1363 <<< Contents 57 * Index >>> Conditional Power for Decision Making shown below. The calculator estimates a final sample size of 11058 for a desired conditional power of 0.8 for the values of πt = 0.0735 and πc = of 0.087. After clicking on the Plot 1364 57.1 CP Calculator-CHW-IM – 57.1.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 button, the user can view the plot of sample size vs. conditional power as shown below. 57.1.3 Time to Event Endpoint A two-arm multi-center randomized clinical trial is planned for subjects with advanced metastatic non-small cell lung cancer with the goal of comparing the current standard second line therapy (docetaxel+cisplatin) to a new docetaxel containing combination regimen. The primary endpoint is overall survival (OS). The study is required to have one-sided α = 0.025, and 90% power to detect an improvement in median survival, from 8 months on the control arm to 11.4 months on the experimental arm, which corresponds to a hazard ratio of 0.7. Accrual duration is 24 months and the study duration 30 months.We shall first create a three look group sequential design for 57.1 CP Calculator-CHW-IM – 57.1.3 Time to Event Endpoint 1365 <<< Contents 57 * Index >>> Conditional Power for Decision Making this study in East as shown below. We will now monitor the trial. Invoke the CHW interim monitoring sheet. Enter at the first look, the cumulative events as 110 and the cumulative test statistic, using test ˆ = −0.288/0.236 = −1.220). At the second look, statistic calculator, as 1.220 (δ̂/SE enter the incremental accrual as 200 and again use the test calculator to enter ˆ = −0.324/0.195 = −1.662). (δ̂/SE Now the interim monitoring sheet will appear as displayed below. 1366 57.1 CP Calculator-CHW-IM – 57.1.3 Time to Event Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 After any interim look, based on the observed values of δ and its SE, you will be able to use conditional power calculator to estimate either conditional power or number of events using appropriate inputs. Let us examine the use of conditional power calculator with a few examples. From the interim monitoring sheet, click on the Click on the icon conditional power calculator as displayed below. to invoke the The calculator is divided into two parts. The first part is the input part. The values for the cells in this part are automatically filled using the interim monitoring sheet values. The calculator indicates that the second interim look has been taken, the cumulative number of events is 200 and the weighted z statistic after the second interim look is −1.683. The second part is the input/output part. Here you may decide to estimate any of the three quantities - required HR, conditional power, number of events by clicking on the appropriate radio button, and then specifying the input for the other two quantities. Computing Conditional Power for Specified Number of Events Now, suppose, you estimate that with the available budget you can extend the study to cover 400 events. In that scenario you may want to know the effect on the conditional power. With the radio 57.1 CP Calculator-CHW-IM – 57.1.3 Time to Event Endpoint 1367 <<< Contents 57 * Index >>> Conditional Power for Decision Making button selected to compute conditional power, enter the value of 400 as the number of events and click on Recalc button. The calculator will display the new estimate for conditional power, which is 0.9239. To view a plot of conditional power vs. number of events, click on the Plot button, select number of events as the x-axis variable and a plot will appear 1368 57.1 CP Calculator-CHW-IM – 57.1.3 Time to Event Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as shown below. In the above plot, if you click on the radio button against HR, you will get the conditional power vs. HR plot displayed below. 57.1 CP Calculator-CHW-IM – 57.1.3 Time to Event Endpoint 1369 <<< Contents 57 * Index >>> Conditional Power for Decision Making Computing Sample Size for Desired Conditional Power If you would like to estimate the number of events required for a specified conditional power, say 0.80, you can click on the radio button against # of events, enter 0.80 as conditional power, and then click on Recalc button. The calculator will display the estimate for number of events, which is 317. To view a plot of number of events vs. conditional power, click on the Plot button, select conditional power as the x-axis variable and a plot will appear 1370 57.1 CP Calculator-CHW-IM – 57.1.3 Time to Event Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as shown below. 57.2 CP Calculator CHW: Simualtion This section explains the use of conditional power calculator while performing adaptive simulations. Simulation capabilities can be useful in verifying the operating characteristics of the design. 57.2.1 Normal Endpoint Let us use the design for normal endpoint that we discussed in section 57.1.1 which is 57.2 CP Calculator - CHW: Simualtion – 57.2.1 Normal Endpoint 1371 <<< Contents 57 * Index >>> Conditional Power for Decision Making shown below. Save this design in the library and then click on the icon to get the simulation worksheet. In this sheet, in the Include Options button, choose Sample Size Re-estimation. You will get the a simulation worksheet. Click on the tab Sample Size 1372 57.2 CP Calculator - CHW: Simualtion – 57.2.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Re-estimation to see the simulation sheet with this tab opened. Keep the simulation parameters displayed in other tabs without any change. The default values for simulation suggest that the difference of means to be 10.0 and the standard deviation to be 50. The maximum sample size to be used till look L=4, is Nm ax = 1075. Let us change this max value to 2150 by modifying the multiplier value from 1 to 2. Also the criterion for when to adapt the sample size is specified by a range of conditional power value from 0.3 to 0.9. Thus after the second look, if the conditional power computed lies between 0.3 and 0.9, the simulation will increase the sample size to a maximum of 2150, so that the conditional power can rise to the desired level of 0.9. In order to assess and observe the effect of varying the values of these simulation parameters, we use the conditional power calculator. Based on the results we get from the conditional power calculator, we decide on a set of values for the simulation parameters and then carry out the simulation. Open the conditional power calculator by clicking on the icon 57.2 CP Calculator - CHW: Simualtion – 57.2.1 Normal Endpoint and the 1373 <<< Contents 57 * Index >>> Conditional Power for Decision Making calculator will appear as shown below. The conditional power calculator is divided into two parts. The first part lists the inputs and gives the current look position and the sample size at the current look. The second part is used to compute either conditional power or sample size given the other quantity and appropriate values among δ/σ, z, δ, and σ depending on the choice made between the options Arbitrary and Estimated. For example if you want to estimate the conditional power for the estimated values of δ = 8 and σ = 70 and for a maximum sample size of 2150, enter these values in the calculator and click on Recalc button. The calculator will compute the values of z and the conditional 1374 57.2 CP Calculator - CHW: Simualtion – 57.2.1 Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 power as shown below. The estimated conditional power of 0.8058 indicates that even with a maximum sample size of 2150, the conditional power cannot reach the desired level of 0.90, if the estimated values of δ = 8 and σ = 70 represent the true values of the population. The quantity δ/σ can either be estimated or design values,using the drop down box against CP Computation Based on:. For this simulation, we will use the estimated value of δ/σ and Z. Select the option Estimated δ/σ, Z. Computing overall Sample Size Suppose we wish to compute the overall sample size required for a conditional power of 0.9. Select the radio button next to the overall sample size and enter the value of 0.9 57.2 CP Calculator - CHW: Simualtion – 57.2.1 Normal Endpoint 1375 <<< Contents 57 * Index >>> Conditional Power for Decision Making for conditional power. Press Recalc to obtain the result as shown below The calculator shows that the overall sample size for the desired conditional power of 0.90 is 2730.8. You may enter this sample size as the Max. Usable sample size by specifying the multiplier as 2730.8/1075 = 2.5403 in the simulation sheet along with appropriate values for other simulation parameters and then carry out the simulation. 57.2.2 Binomial Endpoint This section looks at the use of conditional power calculator during the adaptive simulation of trials with binomial endpoints. Let us use the design for binomial endpoint that we discussed in section 57.1.2 which 1376 57.2 CP Calculator - CHW: Simualtion – 57.2.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is shown below. We will now simulate this plan using adaptive simulation in East. click on the icon to get the simulation worksheet. In this sheet, in the Include Options button, choose Sample Size Re-estimation. You will get the a simulation worksheet. Click on 57.2 CP Calculator - CHW: Simualtion – 57.2.2 Binomial Endpoint 1377 <<< Contents 57 * Index >>> Conditional Power for Decision Making the tab Sample Size Re-estimation to see the simulation sheet with this tab opened. Keep the simulation parameters displayed in other tabs without any change. The default values indicate that the proportion of response for control arm to be 0.087 and for the treatment arm to be 0.0696. The criterion for when to adapt the sample size is specified by a range of conditional power value from 0.3 to 0.82. Change the multiplier value from 1 to 2, so as to get maximum sample size if adapt to become 16000. Thus, after the second look, if the estimated conditional power lies between 0.3 and 0.82, then the simulation process will increase the sample size to a maximum of 16000, so as to raise the conditional power to the desired level of 0.82. In order to assess and observe the effect of varying the values of these simulation parameters, you may use the conditional power calculator. Based on the results you get from the conditional power calculator, you may decide on a set of values for the simulation parameters and then carry out the simulation. Open the conditional power calculator by clicking on the icon 1378 57.2 CP Calculator - CHW: Simualtion – 57.2.2 Binomial Endpoint and the <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 calculator will appear as shown below. The conditional power calculator is divided into two parts. The first part lists the inputs and gives the current look position and the current sample size. The second part is used to compute either conditional power or sample size given the other quantity and appropriate values among πc , πt , and z, depending on the choice made between the options Arbitrary and Estimated. Computing conditional power for a specified sample size For example if you want to estimate the conditional power for the estimated values of πc = 0.085 and πt = 0.074 and for a maximum sample size of 16,000, enter these values in the calculator and click on Recalc button. The calculator will compute the values of z 57.2 CP Calculator - CHW: Simualtion – 57.2.2 Binomial Endpoint 1379 <<< Contents 57 * Index >>> Conditional Power for Decision Making and the conditional power as shown below. The computed conditional power of 0.7715 indicates that the maximum usable sample size of 16,000 may have to be increased in order to attain the desired conditional power of 0.82. Computing overall Sample Size for a desired conditional power Now suppose we wish to compute the overall sample size required for a conditional power of 0.82. Select the radio button next to the overall sample size and enter the value of 0.82 for Computed conditional power. Press Recalc button to obtain the result as shown 1380 57.2 CP Calculator - CHW: Simualtion – 57.2.2 Binomial Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. East computes that the required over all sample size for the desired conditional power is 17793.7. Now you may enter this value for Max. Sample Size in the simulation sheet and then carry out simulation. 57.2.3 Time to Event Endpoint Let us use the design for survival endpoint that we discussed in section 57.1.3 which is 57.2 CP Calculator - CHW: Simualtion – 57.2.3 Time to Event Endpoint 1381 <<< Contents 57 * Index >>> Conditional Power for Decision Making shown below. We will now simulate this plan using adaptive simulation in East. click on the icon to get the simulation worksheet. In this sheet, in the Include Options button, choose Sample Size Re-estimation. You will get the a simulation worksheet. Click on 1382 57.2 CP Calculator - CHW: Simualtion – 57.2.3 Time to Event Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the tab Sample Size Re-estimation to see the simulation sheet with this tab opened. Keep the simulation parameters displayed in other tabs without any change. The default values for simulation suggest that the hazard rates for control and treatment arms as 0.0866 and 0.0607 respectively with the resulting Hazard Ratio of 0.70. The maximum number of events to be used till look L=2, is M axEvents = 340 with the multiplier at the default value of 1.0. Also the criterion for when to adapt the number of events is specified by a range of conditional power value from 0.3 to 0.9. The target CP is at the default value of 0.90. In order to assess and observe the effect of varying the values of these simulation parameters, we use the conditional power calculator. Based on the results we get from the conditional power calculator, we decide on a set of values for the simulation parameters and then carry out the simulation. Open the conditional power calculator by clicking on the icon and the 57.2 CP Calculator - CHW: Simualtion – 57.2.3 Time to Event Endpoint 1383 <<< Contents 57 * Index >>> Conditional Power for Decision Making calculator will appear as shown below. The conditional power calculator is divided into two parts. The first part lists the inputs and gives the current look position and the number of events at the current look. The second part is used to compute either conditional power or number of events given the other quantity and appropriate values of HR, and z, depending on the choice made between the options Arbitrary and Estimated. For example if you want to estimate the conditional power for the estimated value of HR = 0.8 and for a maximum no.of events of 500, enter these values in the calculator and click on Recalc button. 1384 57.2 CP Calculator - CHW: Simualtion – 57.2.3 Time to Event Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The calculator will compute the values of z and the conditional power as shown below. The estimated conditional power of 0.779 indicates that even with the number of events at a maximum of 500, the conditional power cannot reach the desired level of 0.90, if the estimated Hazard Ratio of 0.80 represents the true value of the population. The quantity of Hazard Ratio can either be defined by the user or estimated or design values. For this simulation, we will use the estimated value of Hazard Ratio. Select the radio button next to Estimated (HR, Z). Computing Number of Events (Overall) Suppose we wish to compute the number of events (overall) required for a conditional power of 0.9. Select the radio button next to the # of Events ( Overall) and enter the 57.2 CP Calculator - CHW: Simualtion – 57.2.3 Time to Event Endpoint 1385 <<< Contents 57 * Index >>> Conditional Power for Decision Making value of 0.9 for conditional power. Press Recalc to obtain the result as shown below The calculator shows that the overall sample size for the desired conditional power of 0.90 is 673. You may specify this number as the Max. Events if Adapt (multiplier, total #) by entering the multiplier as 673/340 = 1.9794 in the simulation sheet. You may then specify appropriate values for other simulation parameters and then carry out the simulation. 1386 57.2 CP Calculator - CHW: Simualtion – 57.2.3 Time to Event Endpoint <<< Contents * Index >>> Volume 8 Special Topics 58 Introduction to Volume 8 1388 59 Design and Monitoring of Maximum Information Studies 60 Design and Interim Monitoring with General Endpoints 61 Early Stopping for Futility 1434 62 Flexible Stopping Boundaries in East 63 Confidence Interval Based Design 64 Simulation in East 1460 1493 1552 65 Predictive Interval Plots 1575 66 Enrollment/Events Prediction - At Design Stage (By Simulation) 1609 67 Conditional Simulation 1658 68 Enrollment/Events Prediction - Analysis 69 Interfacing with East PROCs 1787 1675 1393 1423 <<< Contents * Index >>> 58 Introduction to Volume 8 This volume contains Chapters 58 through 69. These chapters describe special design and monitoring tools that, rather than being end-point specific, cut across all different types of group sequential designs. Chapter 59 deals with the design and monitoring of trials on an information scale rather than on a sample size scale. By fixing the maximum information but allowing the sample size to float one can ensure that a study will be adequately powered despite poor initial guesses about nuisance parameters like σ 2 . Chapter 60 describes how one can convert any fixed sample design into a group sequential design. Suppose, for example, that you wish to run a three period cross-over study as a group sequential design with interim looks for early stopping for efficacy and futility. Since East does not at present support this type of design you may first obtain the necessary sample size for a single look design on your own, perhaps with other commercial software. This sample size would be input to East and the single look design would then be converted into a group sequential design with stopping boundaries and a corresponding inflated sample size. Chapter 61 discusses early stopping for futility. Chapter 62 describes all the different types of stopping boundary families that are available in East, such as Haybittle-Peto, Wang-Tsiatis, Lan-DeMets etc. Chapter 63 illustrates through several examples how East may be used to obtain sample sizes that are based on the desired width of a confidence interval for the parameter of interest rather than being based on the desired power of a hypothesis test. Chapter 64 discusses the various types of simulation tools provided by East. Chapter 65 explains the concept of predicting the future course of a trial with Predictive Interval Plots. Chapter 66 discusses the Enrollment/Events Prediction At Design Stage. Chapter 67 discusses the Enrollment/Events Prediction At Interim Monitoring Stage using conditional simulations. Chapter 69 discusses the interaction of East 6 with East PROCs. 1388 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 58.1 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 58.1 Settings 1389 <<< Contents 58 * Index >>> Introduction to Volume 8 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 1390 58.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 58.1 Settings 1391 <<< Contents 58 * Index >>> Introduction to Volume 8 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 1392 58.1 Settings <<< Contents * Index >>> 59 Design and Monitoring of Maximum Information Studies This chapter discusses the use of a general tool for designing and monitoring studies on the ”information” scale rather than on the ”sample size” scale. It is based on the work of Lan and Zucker (1993), Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997), and Mehta and Tsiatis (2001). It permits a general methodology for group-sequential inference, applicable to any data-generating process with or without covariates. Suppose we wish to detect an effect of magnitude δ with power 1 − β using a two-sided level-α, K-look group sequential test. The parameter δ may be a binomial probability, a mean from a normal distribution, a difference of two means, a difference of two binomial probabilities, an odds ratio, a hazard ratio, a ratio of Poisson rates, the coefficient of interest in a regression model, or any other univariate “effect size” parameter of interest. The fundamental idea is that no matter what parameter δ we wish to make inferences about, the maximum amount of statistical information, Imax , needed to make the inference is always obtained in the same manner. It is computed by the formula zα/2 + zβ 2 Imax = × IF(α, β, K, boundary) (59.1) δ where IF (.) is an inflation factor that depends on α, β, K and the stopping boundary, but does not depend on δ. Equation (59.1) tells us, at the design stage, how much information about δ we need in order to achieve 1 − β power. It is applicable in all types of designs, ranging from simple 1-sample normal or binomial designs to more complicated designs based on generalized linear models for discrete categorical or continuous data, parametric survival models, proportional hazard models, mixed effects models, and semi-parametric models for longitudinal data. However, once the trial is underway we need to know how much information about δ has already been accumulated, so as to determine if it is time to terminate the trial. If δ̂j is an estimate of δ at the jth interim analysis, the information about δ is estimated by the relationship Ij ≈ var[δ̂j ]−1 (59.2) One could therefore adopt a common design and monitoring strategy for all types of group sequential trials, regardless of the endpoint or the model generating the data. 1. Use equation (59.1) to determine the maximum required information, Imax . 2. At the j-th interim look, estimate Ij , the amount of information currently available about δ, using equation (59.2). 3. If either Ij ≥ Imax or the stopping boundary is crossed at information fraction tj = Ij /Imax , terminate the trial. Otherwise continue. 1393 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies This strategy is appealing both because of its general applicability and because it does not require a priori specification of unknown nuisance parameters. In practice, however, we would be obliged to provide at least an initial estimate of the maximum sample size so that the sponsor of the clinical trial could have some idea of the resources to be committed up-front. For example, suppose that Xt ∼ N (µt , σ 2 ), Xc ∼ N (µc , σ 2 ) and δ = µt − µc . Then δ̂(K) = X̄t − X̄c and var(δ̂K ) = 4σ 2 /nmax , so that finally, nmax = 4σ 2 Imax . (59.3) If we were designing the study on the basis of maximum information there would not be any nuisance parameters, whereas if we design the study on the basis of maximum sample size, we would need to know the value of σ 2 . One possibility would be to fix a tentative value for nmax at the design stage, based on our best initial guess at the value of σ 2 . In the previous chapters the group-sequential approach has been utilized exclusively to monitor a study with a view to early stopping. It would seem reasonable, however, to take advantage of the data available at each interim monitoring time-point also to revise our initial estimate of σ 2 and thereby improve the study design adaptively. Here we will illustrate the procedure with three examples: 1) comparing two binomial distributions where the control response rate is unknown, 2) comparing two normal distributions where the variance is unknown, and 3) comparison of two poisson rates. These examples are intended to demonstrate that the sample size of a study may be revised as data for estimating nuisance parameters become available at the interim monitoring time-points. 59.1 Two Binomials with Unknown Control Response 59.1.1 Information Based Design 59.1.2 Information Based Monitoring Consider the information based design and monitoring of a randomized clinical trial comparing an experimental therapy with a control therapy based on a dichotomous outcome and equal treatment allocation. Let πc be the response rate for the control arm, πt be the response rate for the experimental arm, and δ = πt − πc . We will now design and monitor this study on the information scale. 59.1.1 Information Based Design Consider a phase III group sequential clinical trial for evaluating the effect of a new drug for prevention of myocardial infarction in patients undergoing coronary artery bypass graft surgery. The study is designed to test the null hypothesis H0 : δ = 0 against the alternative hypothesis H0 : δ < 0 using a two sided test at significance level α = 0.05. We plan the study to detect a 15% reduction in incidence compared to placebo with 90% power. At the time of designing the study we don’t have any reliable estimate of incidence of myocardial infarction in placebo. Therefore, we prefer 1394 59.1 Two Binomials Example – 59.1.1 Information Based Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 information based design that does not rely on the incidence rate of myocardial infarction in placebo. Single look study Click Other on the Design tab, and then click Information Based as shown below. A new input window will appear. We will design a study without any interim look. Leave the Number of Looks as 1 only. Select 2-Sided for Test Type and enter the values of Type I Error (α) and Power (1-β) as 0.05 and 0.9, respectively. Change Treatment Effect to −0.15. Click Compute. The output is shown as a row in the Output Preview located in the 59.1 Two Binomials Example – 59.1.1 Information Based Design 1395 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies lower pane, with the computed maximum information displayed. East tells us that the total information required to achieve the operating characteristics of the above study with a fixed sample design is 467 units. This quantity, denoted by I1 (see Appendix B, Section B.3 for details), was computed by the equation I1 = zα/2 + zβ δ1 2 . (59.4) The subscript ‘1’ indicates that I1 is the required information for a single look study. Information is approximately equal to the square inverse of the standard error of the estimate of δ. Thus, in a fixed sample trial, the desired power can be achieved if we go on accruing patients until [se(δ̂)]−2 = 466.996. This design has default name Des 1. Save this design in the current workbook by selecting the row corresponding to Des 1 in Output Preview and clicking the Output Preview toolbar. on Multi look study Suppose we actually intend to monitor the study four times. In order to do this, create a new design by selecting Des 1 in the Library, and clicking the icon on the Library toolbar. First, change the Number of Looks from 1 to 4, to generate a study with three interim looks and a final analysis. Click the Boundary Info tab. Suppose, you have decided to go for a design with 4 interim looks that allows to reject H0 for efficacy. In order to do this, select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in Efficacy 1396 59.1 Two Binomials Example – 59.1.1 Information Based Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 box. Select None for Boundary Family in Futility box. Click Compute. A new row will be added in the Output Preview. Save this design in the current workbook by selecting the row corresponding to Des 2 in Output Preview and clicking on the Output Preview toolbar. East has inflated the maximum information of the single-look study by an appropriate inflation factor to compensate for the power loss of monitoring four times instead of once. The new maximum information, IK , for a K-look study is shown in Appendix B, Section B.3 to be IK = I1 × IF(α, β, K, boundary) where IF(α, β, K) is an inflation factor that depends on α, β, K and the type of stopping boundary used. The new maximum information is 475.5 units, instead of 467 units. The monitoring strategy for the above sequential trial calls for accruing subjects onto the study until the total information, as measured by [se(δ̂)]−2 , equals 475.5 units or until a stopping boundary is crossed, whichever comes first. Now it is difficult to know how long to accrue subjects when the accrual goals are expressed in units of 59.1 Two Binomials Example – 59.1.1 Information Based Design 1397 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies square inverse standard error instead of being expressed in terms of a physical quantity like sample size. We need to translate units of information into sample size units. This is easy to do since the variance of δ̂ is a simple function of the πc , δ1 , and the total sample size, nK . Thus IK ≈ [se(δ̂)]−2 = (πc )(1 − πc ) (πc + δ1 )(1 − πc − δ1 ) + nK /2 nK /2 −1 . Now since East has already computed IK = 475.533 for K = 4, we obtain nK = 2 × 475.533 × [(πc + δ1 )(1 − (πc + δ1 )) + (πc )(1 − πc )] . (59.5) East provides you with a convenient sample size calculator for converting the 475.533 units of Fisher information into a sample size, based on equation (38.5). To invoke this calculator, right click on Des2 in the Library and select Sample Size Calculator from the list. 1398 59.1 Two Binomials Example – 59.1.1 Information Based Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 And select Difference of Proportions from the dropdown of Translate Information From. The calculator appears as shown below: You can alter the control binomial probability in the top cell of the dialog box, and East will compute the corresponding maximum sample size based on maximum Fisher information of 475.533 units. For example, if the baseline response probability is 0.25, the 475.533 units translates into a maximum sample size of 264 subjects (both treatments combined). Based on historical data we assume that the control response rate is 0.3. When you enter 0.3 into the top cell of the dialog box and press the Recalc button, East reveals 59.1 Two Binomials Example – 59.1.1 Information Based Design 1399 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies that the maximum sample size needed for this sequential study is 321. Thus on the assumption that the control response rate is 0.3, we require an up-front commitment of 321 subjects to meet the operating characteristics of this study. (We can verify this independently by designing a 4-look binomial study using the unpooled 1400 59.1 Two Binomials Example – 59.1.1 Information Based Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 estimate of standard error as shown below. See Section 23.1 for further details.) Of course, if the assumption that the control response rate is 0.30 is incorrect, 321 subjects will not produce the desired operating characteristics. Depending on the actual value of the control response rate, we might have either an under-powered or over-powered study. We shall show in the next section that one of the major advantages of the information based approach is that we can use all the data accrued at any interim monitoring time point to re-estimate the control response rate and, if it differs from what was assumed initially, re calculate the sample size. 59.1.2 Information Based Monitoring Select Des 2 in the Library, and click a interim monitoring dashboard. from the Library toolbar. This will open If we monitor the data at any chronological time τ , an efficient estimator of δ is δ̂(τ ) = π̂t (τ ) − π̂c (τ ) and the standard error of this estimator is se(δ̂(τ )) = [(π̂t (τ ))(1 − π̂t (τ ))/n1 (τ ) + (π̂c (τ ))(1 − π̂c (τ ))/n2 (τ )]1/2 , 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring 1401 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies where π̂i (τ ) is the sample proportion responding to treatment i among the ni (τ ) individuals assigned to treatment i by time τ , i = 1, 2. The information accrued at this time point is I(τ ) = (se(δ̂(τ )))−2 and the value of the Wald test statistic is T (τ ) = δ̂(τ )/se(δ̂(τ )) . The information fraction at chronological time τ is t(τ ) = I(τ )/475.5327. We will stop the study if the test statistic crosses the LD(OF) stopping boundary at this information fraction. For future reference, we will also refer to the information fraction as “process time”. In contrast, the time τ will also be referred to as “calendar time”. Results at the First Interim Monitoring Time Point Suppose that at the first interim monitoring time point, τ1 , we observe 15/60 responders on placebo and 14/60 responders on treatment. Then δ̂(τ1 ) = −0.017, se(δ̂(τ1 )) = 0.0781. To pass these values to East, click from the toolbar to invoke the Test Statistic Calculator. Enter the information above, and click 1402 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Recalc: Saying ‘Yes’ to this message will update that the current information to 163.945 units and the current value of the test statistic to -0.218. Now click OK to continue. East displays the information fraction, t(τ1 ) = 163.945/475.533 = 0.345, and computes the appropriate stopping boundary at that process time. The value of the stopping boundary is ±3.643. Since our test statistic did not exceed this boundary, we 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring 1403 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies continue to the next interim monitoring time point. We have accrued 120 subjects out of the 321 required under the design assumption that the nuisance parameter is πc = 0.30. The information fraction under this design assumption is thus 120/321 = 0.374, while the actual information fraction is 0.345. Thus the information appears to be coming in a little slower than anticipated, but this difference does not seem serious enough to alter the sample size requirements of the study. Results at the Second Interim Monitoring Time Point Suppose that at the second interim monitoring time point, τ2 , we observe 29/120 responders on treatment and 41/120 responders on placebo. Therefore, the estimate of δ̂ is −0.1 with standard error as 0.058. Click to bring up Test Statistic Calculator. Enter −0.1 for Estimate of δ and 0.058 for Standard Error of 1404 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Estimate of δ. Click Recalc, and then click Yes. The information accrued at this time point is 297.265 and the observed value of the test statistic is T (τ2 ) = −1.724. Upon pressing the OK button, these values are pasted into the interim monitoring dashboard. The information fraction is 0.625. The required stopping boundary is 2.609. Since the absolute value of test statistic is smaller than 2.609, the stopping boundary is not crossed and, once more, the study continues. This time the anticipated information fraction under the assumption that πc = 0.30 is 240/321 = 0.748, which is considerably larger than the actual information fraction 0.625. Thus, there is considerable evidence that the information is coming in slower than anticipated. In fact, the data suggest that the value of πc is close to 0.34, as the 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring 1405 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies estimate at the first look 14/60 = 0.238 and the estimate at the second look is 29/120 = 0.242. It might therefore be prudent to re-estimate the sample size of the study. The new maximum sample size can be obtained by the relationship n(τ2 ) I(τ2 ) = . nmax Imax Thus the maximum sample size (rounded up to the nearest integer) is nmax = n(τ2 ) × Imax 475.533 = 240 × = 389. I(τ2 ) 297.265 Therefore we need to commit 389 subjects to the study, not 321 as originally estimated. We see that the original design with 321 subjects would have led to a seriously under-powered study. Results at the Third Interim Monitoring Time-Point We continue to accrue subjects beyond the 321 in the original design, and reach the third interim monitoring time point at time τ3 with 61/180 responders on placebo and 41/180 responders on treatment. Therefore, the estimate of δ̂ is −0.111 with standard error as 0.047. Click on the 1406 icon, and enter −0.111 for 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Estimate of δ and 0.047 for Standard Error of Estimate of δ. Click Recalc, and then click OK. The information accrued at this time point is 452.69 and the observed value of the test statistic is T (τ2 ) = −2.362. Now click OK to update the charts and tables in the dashboard. Now the stopping boundary is crossed 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring 1407 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies and the following dialog box appears. Click Stop. At this look, the total information accrued is 452.694 and the observed value of the test statistic is T (τ3 ) = −2.362. Since the absolute value of the test statistic exceeds the corresponding stopping boundary, 2.05, the stopping boundary is crossed and the study terminates with a statistically significant outcome. You see the IM sheet results as shown below. The adjusted p-value is 0.023, with a final adjusted estimate of the difference of 1408 59.1 Two Binomials Example – 59.1.2 Information Based Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 −0.109. This example highlights the fundamental difference between information based sequential monitoring and conventional sequential monitoring. Had the study been monitored by the conventional method, the maximum sample size would have been fixed from the start at 321 subjects and there would have been no flexibility to change the level of this physical resource over the course of the study. But in an information based approach the maximum information is fixed, not the maximum amount of a physical resource. Thus, the maximum sample size could be altered over the course of the study from 321 subjects to 389 subjects, while the maximum information stayed constant. Without this flexibility, the power of the study would be severely compromised. 59.2 Two Normals with Unknown Variance 59.2.1 Info Based Design 59.2.2 Info Based Monitoring In this section we will consider the PRIMO study (Pritchett et al., 2011). This was a multinational, multicenter randomized controlled trial to assess the effects of paricalcitol (a selective vitamin D receptor activator) on mild to moderate left ventricular hypertrophy in patients with chronic kidney disease. The primary endpoint was change in left ventricular mass (LVM) index. Let µt and µc be the change in LVM index in paricalcitol and placebo, respectively. δ = µt − µc denotes the difference in change in LVM index in paricalcitol compared to placebo. We want to test the hypothesis H0 : δ = 0 against H0 : δ < 0. A mean difference of 2.7g/m in LVM index change was considered clinically meaningful. Therefore, we will design a study to detect δ1 = −2.7 with 90% power. 59.2.1 Information Based Design There is no reliable estimate available for the standard deviation (σ). Therefore, an information based design that does not rely on the standard deviation would be preferable in this case. An unblinded interim analysis was conducted for early termination and to make an informative decision with respect to sample size adjustment. Interim analysis was planned when 90% of subjects are enrolled. First, click Other on the Design tab, and then click Information Based as shown 59.2 Two Normals with Unknown Variance – 59.2.1 Info Based Design 1409 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies below. Change the Number of Looks to 2. This will add a tab with label Boundary Info. We will come back to this tab later. In the Design Parameters tab, select Test Type as 2-Sided and enter Type I Error (α) and Power (1-β) as 0.05 and 0.9, respectively. Change Effect Size to −2.7. The Design Parameters tab should appear as below: Now click Boundary Info. Select Spending Functions for Boundary Family, Gamma Family for Spending Function and −8 for Parameter (γ) in the Efficacy box. In the Futility box, select None for Boundary Family. Since we want to have a interim look at 90% of sample size, specify 0.9 for ‘Info. Fraction at Interim Look’. 1410 59.2 Two Normals with Unknown Variance – 59.2.1 Info Based Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Boundary Info tab should appear as below. Click Compute. The output is shown as a row in the Output Preview located in the lower pane. The computed maximum information is highlighted in yellow. East tells us that the total information required to achieve the operating characteristics of the above study is 1.448 units.The monitoring strategy for the above 2-look sequential trial calls for accruing subjects onto the study until the total information, as measured by [se(δ̂)]−2 , equals 1.448 units or until a stopping boundary is crossed, whichever comes first. Now we can translate this information into sample size using the following relationship: nmax = 4σ 2 Imax . In the PRIMO study the initial estimate of σ is assumed as 6.39. Save this design in the current workbook by selecting the row corresponding to Des 1 in Output Preview and on the Output Preview toolbar. Right click on the design node to clicking invoke the Sample Size Calculator. Plug in this value in the calculator, the 1.448 units 59.2 Two Normals with Unknown Variance – 59.2.1 Info Based Design 1411 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies translates into a maximum sample size of 237 (total sample size). If we design the study for a maximum sample size of 237 patients, we will achieve 90% power so long as our estimate of σ, 6.39, is correct. On the other hand we gain more flexibility by designing the study for maximum information of 1.448 units. This design parameter remains the same whether the standard deviation is 6.39 or something different. As the data accumulate during the interim monitoring phase, we will obtain more accurate estimates of the standard deviation and can revise the sample size on that basis. We shall show in the next section that one of the major advantages of the information based approach is that we can use all the data accrued at any interim monitoring time point to re-estimate the σ and, if it differs from what was assumed initially, re-calculate the sample size. 59.2.2 Information Based Monitoring We will monitor the study on the information scale. Select Des 1 in the Library, and click from the Library toolbar. This will open an interim monitoring dashboard. Results at the First Interim Monitoring Time-Point Recall that the study is planned to have an interim look when 90% of sample size are accrued. Therefore a interim look can be planned when 237 × 0.9 or 214 subjects are evaluated. Suppose that at the first interim monitoring time-point, there were 107 1412 59.2 Two Normals with Unknown Variance – 59.2.2 Info Based Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 subjects on the placebo arm, and 107 subjects on the treatment arm, δ = −2.85, sc = 7.5 and st = 7.4. Based on the sample standard deviation, the pooled standard deviation is 7.45 and [se(δ̂)] = 1.019. icon to invoke the Test Statistic Calculator. Enter Click on the −2.85 for Estimate of δ and 1.019 for Standard Error of Estimate of δ. Click Recalc, and then click Yes. The information accrued at this time point is 0.963 and the observed value of the test statistic is T (τ1 ) = −2.797. Pres OK to update the IM dashboard. The information fraction is 0.665. The required stopping boundary is 2.927. Since the absolute value of test statistic is smaller than 2.927, the stopping boundary is not crossed and the study continues. This time the anticipated information fraction under the assumption that σ = 6.4 is 214/237 = 0.903, which is considerably larger than the actual information fraction 0.665. Thus, there is considerable evidence that the information is coming in slower than anticipated. In fact, the data suggest that the value of σ is close to 7.45. It might therefore be prudent to re-estimate the sample size of the study. The new maximum sample size can be obtained by the relationship n(τ1 ) I(τ1 ) = . nmax Imax Thus the maximum sample size (rounded up to the nearest integer) is nmax = n(τ1 ) × Imax 0.903 = 214 × = 291. I(τ1 ) 0.665 Therefore we need to commit 291 subjects to the study, not 237 as originally estimated. Thus it is clear that unless we increase patient accrual from the initial specification of 237, we will have a seriously underpowered study. Let us assume then that the investigators agree at this stage to increase the sample size to 291 patients. Results at the Final Look Suppose that at the final look we have accrued 291 patients of which 145 are allocated to placebo and 146 are allocated to new drug paricalcitol. Based on these subjects, 59.2 Two Normals with Unknown Variance – 59.2.2 Info Based Monitoring 1413 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies δ = −2.93, sc = 7.43 and st = 7.41. Thus, the pooled standard deviation is 7.42 and [se(δ̂)] = 0.870. Click on the icon. In the Test Statistic Calculator, tick the checkbox of Set Current Look as Last. Enter −2.93 for Estimate of δ and 0.870 for Standard Error of Estimate of δ. Click Recalc, and then click Yes. The information accrued at this time point is 1.321 and the observed value of the test statistic is T (τ2 ) = −3.368. The Test Statistic Calculator should look as below. Upon pressing the OK button a pop-up window will appear notifying you that H0 is 1414 59.2 Two Normals with Unknown Variance – 59.2.2 Info Based Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 rejected as the test statistic exceeds the critical boundary. Click Stop. This time East tells us that the stopping boundary of 1.962 has been crossed and the study terminates with the conclusion that the paricalcitol does indeed lower the change in LVM index relative to the placebo. The adjusted estimate of the difference is -2.752 and the adjusted p-value is 0.004. The 95% adjusted confidence interval for the reduction is [-4.519, -0.919]. 59.2 Two Normals with Unknown Variance – 59.3.2 Info Based Monitoring 1415 <<< Contents 59 59.3 * Index >>> Design and Monitoring of Maximum Information Studies Equality of Two Poisson Rates 59.3.1 Trial Design 59.3.2 Interim Monitoring We will use an information-based approach to design a stroke prevention study that was previously discussed in detail in Chapter 60, Section 60.1.1. The goal is to design a balanced two arm randomized clinical trial for high risk patients with atrial fibrillation in which the standard treatment (adjusted dose warfarin) has a Poisson event rate of 1.8% per year (i.e., 1.8 ischaemic stroke events per 100 people per year). If the experimental treatment (low-dose warfarin plus aspirin) has a Poisson event rate in excess of 3% per year, we wish to detect this with 90% power using a one sided test conducted at the 5% level of significance. Let λc and λt denote the Poisson event rates for the control and treatment arms, respectively, and define the risk ratio γ= 59.3.1 λt . λc Trial Design We wish to test the null hypothesis that γ = 1 against the one-sided alternative hypothesis that γ > 1 using a test at significance at level α = 0.05. The test is required to have power 1 − β = 0.9 at the alternative γ = 3/1.8 = 1.667. In Section 60.1.1, we designed and monitored this study using traditional large-sample methods of unconditional inference. In the present section, we will use an alternative conditional method of inference for comparison purposes. Although there have been no formal studies comparing the conditional and unconditional approaches for Poisson data it is generally believed that the conditional approach has greater accuracy. For example, Breslow and Day (1987) utilize the conditional approach in their monograph on cohort studies. Suppose that Xc is the number of events observed on the control arm, Xt is the number of events observed on the treatment arm and N = Xc + Xt . Then it is well known that the conditional distribution of Xt given N is binomial with parameters (π, N ) where π= nt γ nc + nt γ (59.6) and nc is the number of person years of follow-up on the control arm and nt is the number of person years of follow-up on the treatment arm. The present study was designed for equal amounts of follow-up on each arm. Thus, at the design stage we may assume that nc = nt . The protocol specifies that γ = 1 under the null hypothesis, and γ = 1.667 under the alternative hypothesis. Therefore, by equation 59.6, the null and alternative hypotheses may be be stated as: H0 : π = 0.5 versus H1 : π = 0.625 . 1416 59.3 Equality of Two Poisson Rates – 59.3.1 Trial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The design has now been formulated in terms of testing the mean of a binomial random variable. Hence with N , the total number δ = π − 0.5 playing the role of effect size. The null and alternative hypotheses can now be specified in terms of δ as H0 : δ = 0 versus H1 : δ = 0.125 . The maximum value of N for a K look group sequential design is thus Nmax = π(1 − π)Imax (59.7) where Imax is computed by equation (59.1) and can be obtained from East. Click Other on the Design tab, and then click Information Based. In the ensuing input dialog box, in the Design Parameters tab, select 1-Sided for Test Type. Specify Type I Error (α) as 0.05. and Power (1-β) as 0.9, respectively. Change Treatment Effect to 0.125. Click Compute. The output is shown as a row in the Output Preview located in the lower pane. 59.3 Equality of Two Poisson Rates – 59.3.1 Trial Design 1417 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies This design has default name Des 1. Save this design in the current workbook by selecting the row corresponding to Des 1 in Output Preview and clicking on the Output Preview toolbar. For Des 1 Imax = 548.09. Equation (59.7) converts Imax to Nmax . In applying this equation, we must specify the value of π at which 90% power is desired. With π = 0.625, we have Nmax = 0.625 × 0.375 × 548.09 = 128 events. This is somewhat lower than the 135 events computed in the general design of Chapter 60, and suggests that the conditional approach used here is more efficient than the unconditional approach. Suppose we wish to take two interim looks and a final look at the accruing data and utilize the usual Lan and DeMets (1983) α-spending function LD(OF). Now create a icon. new design by right-clicking Des 1 in the Library, and edit it by clicking Change the Number of Looks from 1 to 3. In the Boundary Info tab, select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in the Efficacy box. In the Futility box, select None for Boundary Family. Click Compute to generate output for this design. A new row will be added in the Output Preview. The maximum information is inflated to Imax = 558.36 and the corresponding maximum number of events is inflated to Nmax = 131. Save this design in the current workbook by selecting the row corresponding to Des 2 in Output Preview and clicking on the Output Preview toolbar. Observe that although the maximum information is slightly inflated, the expected information under H1 is only 432.696. If H1 is true then π = 0.625 so that the corresponding expected number of events is 0.625 × (.375) × 432.696 = 101, a considerable saving over the single look design. 59.3.2 Interim Monitoring Let us monitor this study using the interim monitoring data published in JAMA (vol 279, No. 16, Table 2). According to this report, the study was monitored after N = 55 events were observed. There were Xc = 11 events on the control arm over nc = 581 1418 59.3 Equality of Two Poisson Rates – 59.3.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 person years of observation. And there were Xt = 44 events on the treatment arm over nt = 558 person years of observation. We can estimate γ from the data as γ̂ = Xt /nt 44/558 = 4.1649 = Xc /nc 11/581 whereupon the estimate of π is π̂ = 558 × 4.1649 = 0.8 , 581 + 558 × 4.1649 the estimate of effect size is δ̂ = 0.8 − 0.5 = 0.3 . and its standard error is r se(δ̂) = π̂(1 − π̂) = 0.054 . N The current information is thus I = 0.053936−2 = 343.75. We enter this value into the interim monitoring worksheet as described below. Select Des 2 in the Library, and click from the Library toolbar. This will icon to invoke the open a interim monitoring dashboard. Click on the Test Statistic Calculator. Enter 0.3 for Estimate of δ and 0.054 for Standard Error of Estimate of δ. Click Recalc, and then click Yes. The information accrued at this 59.3 Equality of Two Poisson Rates – 59.3.2 Interim Monitoring 1419 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies time point is 342.936 and the observed value of the test statistic is T (τ1 ) = 5.556. Finally, click OK to paste this information in the monitoring dashboard. Now, the stopping boundary is crossed, and a dialog box appears. Click Stop. Since the test statistic, Z = 0.3/0.054 = 5.556 exceeds the upper stopping boundary, the trial is terminated. A table for Final Inference will appear in the dashboard. The lower confidence bound of the adjusted confidence interval for δ is 0.211 implying that π is at least 0.5 + 0.211 = 0.711 with 95% confidence. Thus the risk ratio γ is 1420 59.3 Equality of Two Poisson Rates – 59.3.2 Interim Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 estimated to be at least π/(1 − π)= 0.711/0.289 = 2.46. The risk of stroke is at least 2.46 times greater on the treatment arm than on the control arm. If the event rate on the control arm is 1.8% per year, then the corresponding event rate on the treatment arm is at least 2.46 × 1.8, or 4.428% per year. 59.4 Some NonStatistical Concerns This chapter demonstrated the monitoring clinical trials which is statistically sound and ensures that the trials will be adequately powered despite inaccurate initial estimates of nuisance parameters that crucially affect the sample size. Provided we are prepared to remain flexible about the final sample size, we can learn as we go, and make appropriate sample size adjustments along the way. The pay-off for adopting this approach is high, both ethically and economically. Many industry trials are over-powered in order to compensate for ignorance about the variability of the data, thereby raising the cost of the trial unnecessarily. Some trials are underpowered because of overly optimistic initial estimates of variability. A promising new therapy might remain undetected despite incuring the high cost running the trial. The information-based approach ensures that we will neither randomize too many subjects nor too few subjects, but just the right number to meet the goals of the trial. A number of factors, unrelated to the statistical methodology, will determine whether or not this idea is adopted in practice. Here is a list of unresolved issues that must be addressed: The time between the intermediate data base lock and the performance of the interim analysis must be shortened, so as to minimize the number of patients being enrolled while the decision to continue or terminate accrual is being made. Institutional Review Boards must be educated on the benefits of these trials. They need to understand that an information-based design with a flexible sample size is, in some situations, more ethical than a design that fixes the sample size up-front, despite considerable uncertainty about its adequacy to achieve the desired power. When the sample size is a random variable, the sponsor may face logistical challenges related to ensuring that sites have sufficient quantities of the drugs or biologics on hand. The sponsor will have to re-think the manner in which the budget is prepared for a trial. Rather than having a fixed budget for each individual trial, it might be necessary to envisage a fixed overall budget for a portfolio of trials which can be allocated to the individual trials in a flexible manner. These information-based trials might be subject to additional regulatory scrutiny. The burden will be on the sponsor to demonstrate that the statistical methodology is sound and, by the manner in which the trial was conducted, the 59.4 Some Non-Statistical Concerns 1421 <<< Contents 59 * Index >>> Design and Monitoring of Maximum Information Studies interim results were not prematurely unblinded. 1422 59.4 Some Non-Statistical Concerns <<< Contents * Index >>> 60 Design and Interim Monitoring with General Endpoints In the previous chapters, we have shown how to use East to design and monitor group-sequential studies with normal, binomial and survival endpoints. In this chapter, we show how to extend East to design and monitor studies with any general endpoint, including longitudinal studies, equivalence studies, and studies where the endpoint is specified as one of the covariates in a generalized linear regression model. In all these settings, we use East in conjunction with some other design package that is capable of computing the sample-size for the end-point in question when there is no interim monitoring. The fixed sample-size thus obtained is then used as an input to the General Design module provided by East. East inflates this fixed sample-size appropriately based on the planned number of interim analyses, the type of stopping boundary, the desired type-1 error and the desired power. The derivation of the appropriate inflation factor for this purpose is discussed in Appendix B, Section B.3. The resulting group-sequential design may then be monitored flexibly using East’s interim monitoring dashboard. We illustrate below with an example involving Poisson data. 60.1 Poisson Model For Stroke Prevention in Atrial Fibrillation, investigators conducted a two arm randomized clinical trial of adjusted-dose warfarin versus low-intensity fixed-dose warfarin plus aspirin for high-risk patients with atrial fibrillation (AF). (See, Lancet, 1996, 348(9028):633-8, for details.) Adjusted-dose warfarin is known to be highly efficacious for prevention of ischaemic stroke in AF patients, with an event rate of only 1.8% per year. This treatment, however, carries a risk of bleeding and requires frequent monitoring. The objective of the study was to determine if low-intensity fixed-dose warfarin plus aspirin, which is safer and easier to administer, might be substituted for adjusted-dose warfarin without resulting in an unacceptably high relative risk of stroke. An event rate in excess of 3% per year with the low-intensity warfarin would be considered unacceptable. We will use East to design and monitor a group-sequential study with two interim looks and one final look. 60.1.1 Design of Stroke Prevention Study The goal is to design a balanced two-arm randomized clinical trial for high-risk patients with AF in which the standard treatment (adjusted-dose warfarin) has a Poisson event rate of 1.8% per year. If the experimental treatment (low-dose warfarin plus aspirin) has a Poisson event rate in excess of 3% per year, we wish to detect this with 90% power using a one-sided test conducted at the 5% level of significance. One can use a standard sample-size package like Egret Siz to determine the total number of events of ischaemic stroke that one must observe in order to detect a difference in Poisson rates of 1.8% per year versus 3% per year (i.e., a risk ratio of 3/1.8 = 1.667) with 90% power using a one-sided fixed-sample Wald test conducted at the 5% significance level. The desired number works out to be 135 events. More direct 60.1 Poisson Model – 60.1.1 Design of Stroke Prevention Study 1423 <<< Contents 60 * Index >>> Design and Interim Monitoring with General Endpoints methods of determining the required number of events, rather than relying on output from a statistical software package, are available through the information based approach discussed in Chapter 59, Section 59.3. The above requirement of 135 events assumed that there would be no interim monitoring for early stopping. This study, however, was intended to be monitored twice during execution, and a third time at the end, each look being taken after equal increments of information. The group-sequential strategies implemented in East are applicable to this problem, and East can determine the amount by which the required number of events for the fixed-sample study should be inflated for the group-sequential design, and then allow to properly monitor the study. The first step is to provide East with the appropriate design parameters. First, click Other on the Design tab and then click General Design: Sample-Size Based as shown below. The upper pane displays the several fields with default values. First, change the Number of Looks to 3, to generate a study with two interim looks and a final analysis. In the Design Parameters tab, select 1-Sided for Test Type. Specify Type I Error (α) as 0.05 and Power (1-β) as 0.9, respectively. Enter 135 for Total SS for Fixed-Sample Study. The Design Parameters tab in the upper pane should appear as 1424 60.1 Poisson Model – 60.1.1 Design of Stroke Prevention Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below: Click Boundary Info. In this tab, you will see Efficacy and Futility boxes. Select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in Efficacy box. Select None for Boundary Family in Futility boxes. The Spacing of Looks is set to Equal, which means that the interim analyses will be equally spaced in terms of the number of patients accrued between looks. The Boundary Info tab should appear as below. Click Compute to generate output for this design. A new row will be added in the Output Preview with label Des 1. 60.1 Poisson Model – 60.1.1 Design of Stroke Prevention Study 1425 <<< Contents 60 * Index >>> Design and Interim Monitoring with General Endpoints In order to preserve the power of this study at 90% while monitoring the data three times, we must inflate the number of events required for a fixed sample size study from 135 to 138 events. That is, we must commit up front to keeping the study open until 138 ischaemic events are observed. On the other hand, since we will be monitoring the data sequentially, we expect to cross the stopping boundary and stop early after only 107 events, on average, if the alternative hypothesis is true. Thus, the increase in sample size corresponds to a small price to pay in order to benefit from the advantages of potential early stopping. Save Des 1 in the current workbook by selecting the row corresponding to Des 1 in Output Preview and clicking on the Output Preview toolbar. For any chosen design, the study has a certain probability of stopping at any of the looks. In order to see the stopping probabilities select Des 1 in the Library, and click . The clear advantage of this sequential design resides in the high probability of stopping by the second look, if the alternative is true, with a sample size of 92 patients, which is well below the requirements for a fixed sample study (135 patients). Close the Output window before continuing. A less conservative approach would be to use stopping boundaries in the spirit of Pocock (1977). To generate stopping boundaries in the spirit of Pocock (1977), create a new design by right-clicking Des 1 in the Library, and selecting Edit Design. Go to the Boundary Info tab. As before, keep Spending Functions for Boundary Family and Lan-DeMets for Spending Function. Change the Parameter to PK in Efficacy boxes. Click Compute. A new row will be added in the Output Preview 1426 60.1 Poisson Model – 60.1.1 Design of Stroke Prevention Study <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 with label Des 2. Under this sequential scheme, we must commit up front to 157 events, but the expected number of events upon stopping the study is only 96 under the alternative hypothesis. Des 1 requires a smaller up front commitment, but Des 2 will stop with a smaller number of events, on average, if the alternative hypothesis is true. Now select Des 2 in Output Preview and click Library. on the Output Preview toolbar to save in the The two designs considered can also be compared in terms of the actual stopping probabilities. In order to see the stopping probabilities with the boundaries with the spirit of Pocock, select Des 2 in the Library, and click . The comparison of stopping probabilities across alternative design options can help in choosing the one with the most desirable properties. In particular, designs that require a larger maximum sample size are usually those that have rather high stopping probabilities at early analyses. Indeed, although Des 2 may require as many as 157 events if the alternative hypothesis is indeed true, there is a higher chance of stopping at the first analysis with this design (stopping probability = 0.428 with 52 events) than with Des 1 (stopping probability = 0.067 with 46 events). Although the trial report did not mention which monitoring strategy, we will assume 60.1 Poisson Model – 60.1.1 Design of Stroke Prevention Study 1427 <<< Contents 60 * Index >>> Design and Interim Monitoring with General Endpoints that the decision was made to use Des 1, with stopping boundaries in the spirit of O’Brien and Fleming, and we shall now proceed with the interim monitoring of the study. The inflation factor, IF (α, β, K, boundaries), for Des 1 is Nmax 138 = 1.022 = N1 135 The IF = IF (α, β, K, boundaries) and η = η(α, β, K, boundaries) are related as η IF = ( )2 zα + zβ With IF = 1.022, α = 0.05 and β = 0.1, η = 2.958. We have obtained η through back calculation. In fact, East calculates IF and Nmax from η. Although this parameter was not specified at the design stage, it is implied by the choice of power, type 1 error, number and spacing of looks and spending function. Specifically, a process of independent increments of the form W (t) ∼ N (ηt, t) (as defined by equations (B.8), (B.9), and (B.10) in Section B.1 of Appendix B) in which η = 2.958, will cross the stopping boundary of the above study design at one of the three equally spaced monitoring times (t1 = 1/3, t2 = 2/3, or t3 = 1) with probability 1 − β = 0.9. The parameter η generated at the design stage is an abstract quantity of no inherent interest to the end user. However, as we shall see in the next two sections, point and interval estimates of η obtained from the data at the interim monitoring stage can be of great interest to the end user, for they can be transformed into corresponding estimates of the relevant treatment difference δ. 60.1.2 Interim Monitoring of Stroke Prevention Study Select Des 1 in the Library, and click from the Library toolbar. Alternatively, right-click on Des 1 and select Interim Monitoring. 60.1.3 First Interim Analysis The report does not mention how many events were observed at the first interim analysis and what the value of the test statistic was at that time. We shall suppose that the study was first monitored after 25 events. Suppose in addition that the treatment group was followed for 210 person years producing 20 events, and the control group was followed for 218 person years producing 5 events. With these data, we can test the null hypothesis that the event rate for ischaemic stroke is the same in the treatment and control groups. Before proceeding with this test, however, it is useful to review some basic theory about the Poisson distribution. Let (λc , λt ) be the Poisson event rates for the treatment and control groups, respectively. It is convenient to characterize the treatment difference in terms of the logarithm of the risk ratio λt . δ = ln λc 1428 60.1 Interim Monitoring of Stroke – 60.1.3 First Interim Analysis <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then the test statistic of interest for testing H0 : δ = 0 is the Wald statistic Z= δ̂ se(δ̂) . (60.1) This statistic is N (0, 1) under the null hypothesis and has the appropriate covariance structure for group sequential inference provided δ̂ is an efficient estimate of δ. At the time of the interim analysis, let nc denote the number of person years of follow-up in the control group, and let xc be the corresponding number of events that are observed in the control group. Similarly, let nt denote the number of person years of follow-up in the treatment group and let xt be the corresponding number of events that are observed in the treatment group. An efficient estimator for δ is now given by δ̂ = ln(xt /nt ) − ln(xc /nc ) . (60.2) In order to compute the standard error, se(δ̂), we need to derive the variance of the random variable X ln(Tn ) = ln( ) n where X is a Poisson random variable with density f (x) = (λn)x e−λn . x! By Poisson theory, E(Tn ) = λ and var(Tn ) = λ . n Thus, under the null hypothesis H0 : δ = 0, √ d n(Tn − λ) −→ N (0, λ) . Therefore, by the delta method (see for example, Agresti, 1990, page 420) √ d n[g(Tn ) − g(λ)] −→ N (0, λ[g 0 (λ)]2 ) . Here g(λ) = ln(λ). Therefore λ[g 0 (λ)]2 = and hence √ 1 λ 1 d n[ln(Tn ) − ln(λ)] −→ N (0, ) . λ 60.1 Interim Monitoring of Stroke – 60.1.3 First Interim Analysis 1429 <<< Contents 60 * Index >>> Design and Interim Monitoring with General Endpoints It follows that 1 . nλ Substituting this result into equation (60.2) we have var[ln(Tn )] = var(δ̂) = (60.3) 1 1 + . nc λ c nt λ t Replacing the Poisson event rates λc and λt by their corresponding maximum likelihood estimates xc /nc and xt /nt we finally obtain r 1 1 se(δ̂) = + xc xt (60.4) so that the test statistic (60.1) becomes ln( nxtt ) − ln( nxcc ) q . Z= 1 1 xc + xt (60.5) Substituting the observed values of xc , xt , nc , nt into equation (60.2) and (60.4), we obtain δ̂ = 1.423682 and se(δ̂) = 0.5. Thus the first interim analysis is performed after observing 55 events with the value of the test statistic being 1.42/0.5 = 2.84. At the top of the Interim Monitoring sheet, click from the toolbar to invoke the Test Statistic Calculator. In this dialog box, enter 25 in Cumulative Sample Size. 1.42 for Estimate of δ and 0.5 for Standard Error of Estimate of δ. 1430 60.1 Interim Monitoring of Stroke – 60.1.3 First Interim Analysis <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Then, click Recalc. Click OK. East displays the information fraction, t(τ1 ) = 25/135 = 0.181, test statistic, T (τ1 ) = 1.42/0.5 = 2.84 and efficacy boundary as 4.458. Thus, we can stop the study if the value of test statistic exceeds 4.458. Since this is not the case, we continue to the next interim monitoring time point. The lower 95% confidence bound on η is -3.803. We can convert this estimate into a lower 95% confidence bound for δ by using the relationship p η = δ Imax derived in Section B.1 of Appendix B. Now observe that Imax = Imax −2 I1 = t−1 . 1 [se(δ̂1 )] I1 60.1 Interim Monitoring of Stroke – 60.1.3 First Interim Analysis 1431 <<< Contents 60 * Index >>> Design and Interim Monitoring with General Endpoints Therefore √ δ = η t1 [se(δ̂1 )] √ (60.6) Thus, the lower confidence bound for δ is −3.803 × 0.181 × 0.5 = −0.809. We can conclude that based on the current data, the ratio of treatment event rate to control event rate is at least exp(−0.809) = 0.445. There is not yet sufficient evidence to exclude a ratio of 1.0. 60.1.4 Second Interim Analysis A published report (JAMA, vol 279, No 16, Table 2) shows that this study was indeed monitored after 55 events were observed. There were only 11 events on the adjusted dose arm (control) with 581 patient years of observation. On the other hand there were 44 events on the fixed dose plus aspirin arm (treatment) with 558 patient years of observation. Entering these data into equations (60.2), (60.4) and (60.5) we obtain δ̂ = 1.427, se(δ̂) = 0.337 and Z = 4.234. In the top part of the IM dashboard, enter 55 for Cumulative Sample Size, 1.427 for Estimate of δ, and 0.337 for Standard Error of Estimate of δ. Click OK. Click OK to update the charts and tables in the dashboard. Now, the stopping 1432 60.1 Interim Monitoring of Stroke – 60.1.4 Second Interim Analysis <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 boundary is crossed, and the following window appears. Click Stop. The left side of the dashboard will show the stopping boundaries and the error spending function. The right side of the dashboard will show a table for final inference, and the confidence intervals: With 39.9% of the information, East was able to reach an early decision in favor of the alternative hypothesis, that fixed-dose warfarin plus aspirin is insufficient for stroke prevention. 60.1 Interim Monitoring of Stroke 1433 <<< Contents * Index >>> 61 Early Stopping for Futility Group sequential methods were developed originally for early stopping if the experimental treatment showed a statistically significant therapeutic advantage at an interim look. In many clinical trials, however, there is limited interest in stopping early for a positive efficacy outcome. This is usually because the investigators wish to continue the trial all the way to the end and gather additional safety data for the experimental arm. Nevertheless, there is a great deal of interest in stopping early for futility if the interim analysis reveals that, with high probability, the trial will end up negative. In that case, the investigators might wish to cut their losses and possibly divert their resources to a more promising study. East provides two ways to stop early for futility: (a) informal – based on conditional power and (b) formal – based on futility stopping boundaries. Industry trials have typically adopted the informal approach, stopping early if the conditional power at an interim analysis is extremely low. We consider this approach to be informal because it is not necessary to specify ahead of time how low the conditional power should be in order to declare futility and terminate the study. The futility threshold can be determined at the time of the interim analysis itself, possibly using both internal data from the trial and external information about other similar trials. It is easy to see that the informal approach will not inflate the type-1 error, provided the only decisions possible at each interim monitoring time point are to either continue the study or stop and declare futility. On the other hand, the informal approach may not preserve the type-2 error (and thus, the study may lose power) as the decision to stop for futility is based on an ad hoc determination that the conditional power is too low. In contrast, the use of a futility boundary guarantees the preservation of power. This is because the boundary is constructed by using the spending function methodology of Lan and DeMets (1983). However, in this case one spends β, the type-2 error, rather than spending α, the type-1 error. The technical details are available in Appendix B. 61.1 1434 Example: Survival in patients with advanced melanoma A phase III trial was conducted to compare overall survival (OS) in Tremelimumab, a fully human anti-cytotoxic T lymphocyte-associated antigen 4 (CTLA4) monoclonal antibody with standard, single-agent chemotherapy (Ribas et. al., 2008). Primary endpoint was OS. Let λt and λc be the overall survival rate in Tremeliumab and standard chemotherapy, respectively. Here, the treatment effect δ is represented in terms of ln (λt /λc ) or the log hazard ratio. Therefore, δ < 0 indicates the beneficial effect of new treatment, Tremelimumab. The study was designed to provide 90% power to detect a 33% improvement in true median OS with an unstratified log-rank test at overall 2-sided significance level of 0.05. Two equally spaced interim analyses were planned based on the group sequential design using the Lan-DeMets alpha and beta spending approach to an O’Brien-Fleming boundary. Improvement of 33% in true 61.1 Example: Survival in patients with advanced melanoma <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 median OS can be translated to ratio of medians as 1.33. In other words,we are 1 considering a hazard ratio of ln 1+0.33 or 0.752. In the study, a median survival time of 10.7 months was observed in the standard chemotherapy group. 61.2 Single-Look Design with No Early Stopping Suppose initially that no interim monitoring is contemplated. First, click Survival: Two Samples on the Design tab, and then click Parallel Design: Logrank Test Given Accrual Duration and Accrual Rates. In the input window, leave the Number of Looks as 1. In the Design Parameters tab, select Design Type as Superiority, Test Type as 2-Sided, and the values for Type I Error (α) and Power (1-β) as 0.05 and 0.9, respectively. Select # of Hazard Pieces as 1 which implies that hazard rates remain constant over time in both Tremeliumab and standard chemotherapy. Select the Input Method as Median Survival Times. Tick the check box for Hazard Ratio (Optional) and select the radio-button Ratio of Medians (mt /mc ). Enter 1.33 for Ratio of Medians (mt /mc ). In the table below, enter 10.7 for Med. Surv. Time (Control). The Design 61.2 Single-Look Design with No Early Stopping 1435 <<< Contents 61 * Index >>> Early Stopping for Futility Parameters tab should now appear as below: Move to the Accrual /Dropout Info tab. The original study does not report about accrual information. However, we will assume that the patients arrive in the study at the rate of 48 per month. For this example, select 1 for # of Accrual Periods and enter 1436 61.2 Single-Look Design with No Early Stopping <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 48 in the Accrual Rate column of the ensuing table. Click Compute to obtain the number of events required to have the desired operating characteristics. This will add a row in the Output Preview. The computed maximum number of events (517) is highlighted in yellow. Select Des 1 in Output Preview and click . This will display the design details 61.2 Single-Look Design with No Early Stopping 1437 <<< Contents 61 * Index >>> Early Stopping for Futility in the Output Summary. Click on the icon to go back to the Output Preview window. Select Des 1 by clicking anywhere along the row in the Output Preview and click to save this design in the Library. Des 1 shows that, in order to achieve the desired 90% power, we must keep the study open until 517 events are observed. Half of these events need to be observed in Tremeliumab arm, and another half in the standard chemotherapy arm. You can see the exact number of events required in each arm by double-clicking on Des 1 in the Library. In this design, there is no provision for interim monitoring to stop the trial early. 1438 61.2 Single-Look Design with No Early Stopping <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 61.3 Group Sequential Design with Early Stopping for Efficacy Recall from section 61.1 that the study was originally planned with two interim looks with the Lan-DeMets spending approach to an O’Brien-Fleming boundary. In this section, we will consider early stopping boundaries for efficacy only. Create a new icon on the Library design by selecting Des 1 in the Library, and clicking the toolbar. First, change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. A new tab with label Boundary Info will appear. The left side contains details for the Efficacy boundary, and the right side contains details for the Futility boundary. Select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in Efficacy box. Select None for Boundary Family in Futility box. Click Compute to generate output for this design. A new row will be added in the Output Preview. Save this design in the current workbook by selecting the row corresponding to Des 2 in Output Preview and clicking on the Output Preview toolbar. Des 2 requires a larger up-front commitment than Des 1. To compare Des 1 and Des 2, select both rows in Output Preview using the Ctrl key and click 61.3 Group Sequential design 1439 <<< Contents 61 * Index >>> Early Stopping for Futility icon. Both designs will be displayed in the Output Summary. In order to achieve the desired 90% power, the study in Des 2 should be kept open until 523 events are obtained. However, under H1 , the required number of events is 420 with expected study duration of 22 months only, compared to 517 events and 26.6 months for Des 1. To see the probability of crossing the stopping boundaries at one of the interim looks, and thus terminating the study earlier, double-click on Des 2 in the 1440 61.3 Group Sequential design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library. You can increase the decimal precision by clicking on the icon and displaying Probability Statistics up to four decimal places. Under H1 there is a 3.34% chance of crossing a boundary at the first look, and 56% chance of crossing at the second look (this column is cumulative). This is why the expected study duration is about 4.5 months less than the study duration with Des 1. However, Des 2 has no formal mechanism for stopping the trial early if the two treatments are similar. Under the null hypothesis, the expected study duration under H0 is nearly the same as for a single look design. 61.4 Informal Use of Conditional Power for Futility Stopping One can use conditional power as an informal guide for terminating a study at an interim monitoring time point. To see how this works, recall that the study has been designed for two interim looks: first, when one-third of deaths are observed and second, when two-thirds of deaths are observed. Right-click Des 2 in the Library, and select Interim Monitoring. First interim monitoring from the toolbar to invoke the Test Statistic Calculator. In Click this dialog box, enter 175 for Cumulative Events, 0.143 as Estimate of δ and 0.477 as Standard Error of Estimate of δ. Click Recalc. The test statistic value is computed and is displayed as 0.3. This appears to be a rather disappointing value for the test statistic half-way through the study, and suggests that the study might not end 61.4 Informal Use of Conditional Power 1441 <<< Contents 61 * Index >>> Early Stopping for Futility up positive after all. Click OK to continue. This will paste the information in the monitoring dashboard. 1442 61.4 Informal Use of Conditional Power <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Examine the Conditional Power section of the monitoring sheet. Conditional powers are calculated at different effect sizes. The conditional power corresponding to HR of 0.1.2 (which is very close to observed HR of 1.1545) is only 0.047. This means that if we were to perform an analysis of the data at 523 events, there is only a 4.7% chance of crossing the upper stopping boundary and declaring statistical significance. Is this chance sufficiently small to warrant terminating the study? There are no objective criteria for making this determination. Recall that the conditional power approach to stopping early for futility is informal. Thus, the low conditional power would have to be considered by the DMC, along with other factors such as toxicity, rate of accrual and parallel developments in other trials. Second interim monitoring Suppose the trial continues and a second interim analysis is performed when almost two-thirds of the events are observed. Assume that the total number of events is 350, and the estimates of δ̂=0.237 and SE(δ̂)=0.206. Enter these values in the test statistic calculator to post the results into the interim monitoring dashboard. Although the value of the test statistic has increased considerably from the value at the previous look, the conditional power has only marginally increased, from 0.047 to 61.4 Informal Use of Conditional Power 1443 <<< Contents 61 * Index >>> Early Stopping for Futility 0.175. Because we are very close to the end of the study, there is only a 17.5% chance of crossing the upper stopping boundary at the final look. Should the study continue or be terminated? Again, the decision is a subjective one. 61.5 Combined Efficacy and Futility Stopping Boundaries 61.5.1 Two-Sided Tests 61.5.2 One-Sided Test 61.5.3 Conservative Futility Boundaries One way to remove the subjectivity from the decision to stop early based on low conditional power is to use formal futility stopping boundaries. East has the provision to simultaneously create efficacy boundaries for rejecting H0 and futility boundaries for rejecting H1 . The efficacy boundaries are generated by an α-spending function that spends the type-1 error. The futility boundaries are generated by a β-spending function that spends the type-2 error. Moreover the two sets of boundaries are forced to meet at the last look so as to ensure that either H0 or H1 is rejected. 61.5.1 Two-Sided Tests Recall that the advanced melanoma study we are considering in this section was implemented using the Lan-DeMets alpha and beta spending approach to an O’Brien-Fleming boundary. We will first consider a two-sided design with both efficacy and futility boundaries. In order to do this, create a new design by selecting icon on the Library toolbar. Click the Des 2 in the Library, and clicking the Boundary Info tab. Select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in both Efficacy and Futility boxes. In the right of the Futility box there is a field where you have to choose either Non-Binding or Binding. Binding futility boundary refers to a situation where the trial must be terminated once the test statistic falls within the futility 1444 61.5 Combined Efficacy and futility – 61.5.1 Two-Sided Tests <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 boundaries; otherwise overall type I error might be inflated. Non-Binding futility boundaries do not have this constraint. For now, select the radio-button corresponding to Binding. The cumulative α and β spent along with the boundary values are shown in the table in the Boundary Info tab. The columns Stop for Efficacy and Stop for Futility in the table provide the flexibility of excluding either efficacy of futility boundaries in certain interim looks, by unchecking the corresponding cells. For this example, leave all the boxes in columns Stop for Efficacy and Stop for Futility checked. Click Compute. A new row will be added in the Output Preview labeled as Des 3. Save this design in the current workbook by selecting the row corresponding to Des 3 in Output Preview and clicking on the Output Preview toolbar. To compare Des 1, Des 2, and Des 3, select all three rows in Output Preview using the Ctrl key and click icon. All three designs will be displayed in the Output Summary. 61.5 Combined Efficacy and futility – 61.5.1 Two-Sided Tests 1445 <<< Contents 61 * Index >>> Early Stopping for Futility Select Des 3 in the Library, and click , then select Stopping Boundaries. Des 3 requires a commitment to keep the study open until either 531 events are observed or a boundary is crossed. However, by providing upper and lower stopping boundaries and an inner wedge, Des 3 has lower expected study durations under both the null and alternative hypotheses. If the test statistic enters: the pink zone (the inner wedge), the trial stops, the alternative hypothesis is rejected, and futility is declared. the lower blue zone, the trial stops, the null hypothesis is rejected, and the new treatment Tremelimumb is declared to be beneficial relative to the standard chemotherapy. the upper blue zone, the trial stops, the null hypothesis is rejected, and the Tremelimumb is declared to be harmful relative to the standard chemotherapy. These boundaries are constructed in such a way that: if the null hypothesis is true (i.e., δ = ln λt /λc = 0), the test statistic will enter the pink inner wedge region with probability 1 − α = 0.95, the upper blue zone with probability 0.025 and the lower blue zone with probability 0.025. if the alternative hypothesis is true with δ = ln λt /λc ≤ ln 0.752 = −0.285, the test statistic will enter the pink zone with probability β = 0.1 and the lower blue zone with probability almost equal to 0.9. if the alternative hypothesis is true with δ ≥ 0.285 the test statistic will enter the 1446 61.5 Combined Efficacy and futility – 61.5.1 Two-Sided Tests <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 pink zone with probability β = 0.1 and the upper blue zone with probability almost equal to 0.9. The inner wedge boundaries give us the chance to stop early if H0 is true. Notice that with Des 3, the expected study duration under H0 is only 20.639 months, as compared to 24.668 months with Des 2. Close this chart before continuing. 61.5.2 One-Sided Test In Des 3 we utilized a total of four boundaries – two-sided upper and lower boundaries for rejecting H0 , and two-sided upper and lower boundaries for rejecting H1 . Such boundaries are only necessary if we wish to actually continue the trial until we have demonstrated that the new treatment is significantly worse than the standard treatment; i.e., until the test statistic enters the lower blue zone and rejects H0 in favor of H1 : δ ≤ 0. If, however, we are willing to stop the study early if equivalence rather than actual harm is demonstrated, a more efficient design consisting of only two boundaries can be devised. Create a new design by selecting Des 3 in the Library, and clicking the icon on the Library toolbar. Click the Design Parameters tab. Replace 2-Sided by 1-Sided, and replace the significance level, α = 0.05 by α = 0.025. Go to the Boundary Info tab. Select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in both 61.5 Combined Efficacy and futility – 61.5.2 One-Sided Test 1447 <<< Contents 61 * Index >>> Early Stopping for Futility Efficacy and Futility boxes. Select the radio-button corresponding to the Binding. Click Compute. This will add a new row to the Output Preview. Save this design in the current workbook by selecting the row corresponding to Des 4 in Output Preview and clicking then click on the Output Preview toolbar. Select Des 4 in the Library, and , and select Stopping Boundaries. Des 4 requires a commitment to keep the study open until either 537 events are observed or one of the two boundaries is crossed. If the test statistic crosses: the upper boundary and enters the pink zone the trial stops, the alternative 1448 61.5 Combined Efficacy and futility – 61.5.2 One-Sided Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 hypothesis is rejected, and futility is declared. the lower boundary and enters the blue zone the trial stops, the null hypothesis is rejected, and the new treatment is declared to be beneficial over the standard chemotherapy. These boundaries are forced to meet at the end of 537 events, thus ensuring that either H0 or H1 will be rejected. They are constructed so that: if the null hypothesis is true (i.e., δ = ln λt /λc = 0), the test statistic will enter the pink zone with probability 1 − α = 0.975 and the blue zone with probability 0.025 if the alternative hypothesis is true (i.e., δ = ln λt /λc = −0.285), the test statistic will enter the pink zone with probability β = 0.1 and the blue zone with probability 0.9. Des 4 therefore meets the regulatory requirement that the false positive rate for a one sided test should not exceed 0.025. It also meets the sponsor’s requirement that the study be designed for 90% power. In terms of shortening the expected study duration, however, Des 4 completely dominates the other three designs. Under H0 the expected study duration is less than 18 months, a saving of over 6.5 months compared to Des 1. There is also over 4.5 months of expected saving relative to Des 1 if H1 is true. Unlike the informal approach, based on conditional power, Des 4 utilizes a formal futility boundary. Since the futility boundary is derived from a β-spending function, the type-2 error (and hence the power of the study) is fully controlled. A drawback of this approach is the loss of flexibility to keep the study open if the futility boundary is crossed. In this case, we must terminate the study. If we keep on accruing patients even after crossing a futility boundary, we are no longer assured of preserving the type-1 error. For this reason, it is important to examine the futility boundary from every angle before making the committment. Accordingly, let us examine the stopping boundaries again, this time on the p-value scale. To display the boundaries on the p-value scale 61.5 Combined Efficacy and futility – 61.5.2 One-Sided Test 1449 <<< Contents 61 * Index >>> Early Stopping for Futility you must select this scale from the drop-down list in the Stopping Boundaries chart. If the p-value (one-sided) at the first look exceeds 0.7622 the study should be terminated for futility. At the second look the futility criterion is p = 0.1646 and at the final look it is p = 0.0251. These values reveal several psychological drawbacks of the selected futility boundary. For instance, even though the overall power of the study is preserved, most investigators would be unwilling to terminate a study and declare futility at an interim analysis where the p-value was 0.1646; they would prefer to complete the study in hopes of a further decline in the p-value. Also, since the boundaries meet at the final look, one could technically reject the null hypothesis and claim that the trial is a success if the final p-value is less than 0.0251. This could appear counter-intuitive because one expects to pay a penalty for having taken multiple looks at the data. Usually the penalty amounts to requiring the cut-off for the final p-value to be less than α = 0.025 in order to declare significance and reject H0 . Here, however, the cut-off for the final p-value exceeds 0.025. It appears that we have been rewarded rather than penalized for having designed a multiple-look study. The reason is that the presence of a futility boundary reduces the risk of crossing the efficacy stopping boundary. If the study were designed with an efficacy boundary only, it would be at risk of crossing the efficacy boundary at each interim look. This would elevate the overall type-1 error unless we imposed a suitable penalty on the final p-value to compensate. On the other hand if the study were designed with a futility boundary only, it would be at risk of crossing the futility boundary at each interim 1450 61.5 Combined Efficacy and futility – 61.5.2 One-Sided Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 look. This would reduce the overall type-1 error unless we rewarded the final look p-value by a suitable amount to compensate. When both efficacy and futility boundaries are present, the efficacy boundaries tend to lower the cut-off for the final p-value to below α whereas the futility boundaries tend to increase the cut-off for the final p-value to above α. Depending on the choice of stopping boundaries the number and timing of the looks and the values of α and β, one or other of these opposing forces dominates, resulting in a cut-off for the final p-value that is sometimes greater than α and sometimes less. Select Des 4 in the Library, and click the icon on the Library toolbar. Change the power from 90% to 95% in the Design Parameters tab. Now go to the Boundary Info tab, and click . Change the Boundary Scale to p-value. Look at the display of the stopping boundary in p-value scale. In this case the penalty imposed by the efficacy boundary has overcome the reward imposed by the futility boundary and the cut-off for the final p-value required to reject H0 and declare 61.5 Combined Efficacy and futility – 61.5.2 One-Sided Test 1451 <<< Contents 61 * Index >>> Early Stopping for Futility statistical significance is less than α = 0.025. 61.5.3 More Conservative Futility Boundaries It is useful to view futility stopping boundaries on a conditional power scale since that permits us to directly compare a formal futility boundary with an alternative informal early stopping criterion where both criteria are based on low conditional power. Select Des 4 in the Library, and then select Stopping Boundaries after clicking the 1452 61.5 Combined Efficacy and futility – 61.5.3 Conservative Futility Boundaries <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon. Select cp delta1 Scale. We are required to terminate the study at the first interim look if the conditional power is less than 0.2999, and at the second interim look if the conditional power is less than 0.4581. These are fairly large conditional power values. The trial investigators might not be willing to commit in advance to stop the study and declare futility if the conditional power is as high as 45%. Consequently, they might prefer to adopt an informal approach to early stopping for futility. However, as we have already discussed, the informal approach cannot ensure that the type-2 error will be preserved and the study might lose power. The availability of a rich family of flexible spending functions in East enables us to pick formal futility boundaries with substantially lower conditional power for futility stopping, within the range of conditional power values that we might use with the informal approach. For example, suppose that the trial investigators do not wish to terminate this trial for futility unless the conditional power is less than 20% at the first interim look, and less than 10% at the second interim look. These rather conservative criteria for early stopping are more realistic than the 30% and 45% conditional power criteria implied by the stopping boundaries of Des 4. Close this chart before continuing. Gamma family β spending function Create a new design by selecting Des 4 in the 61.5 Combined Efficacy and futility – 61.5.3 Conservative Futility Boundaries 1453 <<< Contents 61 * Index >>> Early Stopping for Futility Library, and then by clicking the icon on the Library toolbar. Click the Boundary Info tab. In the Futility box, change the Spending Function to Gamma Family from the drop-down list. We must choose a parameter value, γ, to identify a specific member of this family. The value γ = −4 will yield a spending function roughly similar to the LD(OF) spending function. Smaller values of γ will yield more conservative spending functions. Since the LD(OF) function (which was used to spend type-2 error in Des 4) yielded unsatisfactory futility boundaries on the conditional power scale, let us be more conservative. Type in −6 as the value of Parameter γ. Select the radio-button next for Binding. Click to show the boundary chart. The stopping boundary for rejecting H0 at the final look is now -1.9889. As this value 1454 61.5 Combined Efficacy and futility – 61.5.3 Conservative Futility Boundaries <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is smaller than -1.96 there is indeed a penalty being paid for the multiple looks. Thus, the psychological difficulty encountered in Des 4, where the final stopping boundary for rejecting H0 was less than -1.96, has been resolved. Click Compute the generate Des 5. We can try to be more conservative in terms of β spending function. Change the parameter for the Gamma spending function from γ = −6 to γ = −8 Click and change the Boundary Scale to the cp delta1 Scale. By viewing the futility boundary on the cp delta scale, the first and second-look values of conditional power required to stop early are, respectively, 0.1991 and 0.0973. These values are within a range where the trial investigators would be willing to guarantee in advance that they would stop the trial and declare futility. The advantage of using the formal futility boundary is, of course, that the type-2 error (and hence the power) is guaranteed to be preserved. Click Compute. This will add a new row to the Output Preview labeled as Des 6. Save this design in the current workbook by selecting the row corresponding to Des 6 in Output Preview and clicking Preview toolbar. on the Output 61.5 Combined Efficacy and futility – 61.6.3 Conservative Futility Boundaries 1455 <<< Contents 61 61.6 * Index >>> Early Stopping for Futility Early Stopping for Futility Only Under Des 6 there is the possibility of rejecting H0 and stopping early for efficacy if the upper stopping boundary is crossed. The α-spending function used to generate the upper efficacy stopping boundary is the LD(OF) spending function proposed by Lan and DeMets (1983). This function is popular because it spends the type-1 error conservatively in the beginning, but still provides a reasonable opportunity for premature termination once the trial gets underway. In contrast, the Gm(-8) β-spending function used by Des 6 to generate the futility boundary is much more conservative and provides considerably less opportunity for premature termination until the study close to completion. To examine these two spending functions together, first select Des 6 in Library. Click in the Library toolbar and then select Error Spending. For the first 40% of the trial, both spending functions are extremely conservative, spending a negligible amount of error. Thereafter, however, the α-spending function starts to spend the type-1 error at a much faster rate making it easier to stop early for efficacy. Let us examine the stopping probabilities for Des 6 under H0 and H1 . Select 1456 61.6 Early stopping for futility only <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Des 6 in the Library and double-click on it. Under H1 , the efficacy boundary would be crossed with probability 0.034 at the first interim analysis, one-third-way through the trial. By the time two-thirds of the trial has been completed, the probability of early stopping for efficacy under H1 at the second look is 0.56 (cumulative). In some studies, however, the investigators have no desire to stop early for efficacy, but only wish to stop early for futility. Early efficacy stopping for a promising new therapy might not be desirable, for instance, if the investigators wish to continue the trial and monitor safety. Early futility stopping under H0 , on the other hand, is desirable since it is better to kill a study that is going nowhere and spend the resources elsewhere. We can discourage early efficacy stopping by using stopping boundaries that are considerably more conservative than the LD(OF) boundary used in Des 6. Let us consider using the Gamma spending function with parameter γ = −18. Create a new design by selecting Des 6 in the Library, and clicking the icon on the Library toolbar. Click the Boundary Info tab. In the Efficacy box, change the Spending Function to Gamma Family from the drop-down list. Type in −18 as the value of Parameter (γ), and click Compute This will add a row in the Output Preview with label Des 7. Select Des 7 by clicking anywhere along the row in the Output Preview and click to save this design in the Library. Select Des 7 in the Library, and click , then select Stopping 61.6 Early stopping for futility only 1457 <<< Contents 61 * Index >>> Early Stopping for Futility Boundaries. Notice how hard it is to stop early for efficacy. Even as late as the second interim look the efficacy boundary value is -3.841 on the standardized difference scale. We would need to see a one-sided p-value smaller than 0.0001 in order to stop early for efficacy. Thus, except in very extreme situations, Des 7 will not permit early stopping for efficacy. An interesting feature of Des 7 is that the p-value required at the final look in order to reject the null hypothesis and declare statistical significance is 0.0251. Although we have designed the study for multiple looks at the data, the cut-off p-value for rejecting H0 at the final look is greater than α = 0.025; i.e., we have been rewarded rather than penalized for the multiple looks. We explained the reason for this seeming anomaly in Section 61.5.2. The final cut-off p-value required to preserve the type-1 error is determined by balancing the penalty due to the presence of an efficacy boundary against the reward due to the presence of a futility boundary. Because of the specific choice of γ parameters, this balance ended up favoring a tiny reward. It might however be important, in an industry trial, to obtain the approval of the regulatory reviewers for using 0.0252 as the final cut-off for rejecting H0 . The simulation tools of East may be used to demonstrate that this cut-off does indeed preserve the type-1 error. Close the chart before continuing. It would be interesting to compare Des 7 with a design that has the same futility boundary but no efficacy boundary whatsoever. To achieve this aim create a new design by selecting Des 7 in the Library, and clicking the 1458 61.6 Early stopping for futility only on the Library <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 toolbar. Click the Boundary Info tab. In the Efficacy box, change the Boundary Family to None from the drop-down list, and click Compute. This will add a row in the Output Preview with label Des 8. The design summary and stopping boundaries (futility only) of this design are displayed below. In this design, the trial stops for futility if the value of the test statistic is less than the corresponding boundary value. The value of the boundary at the final look is -1.9264. Therefore if the value of the test statistic is less than -1.9264 at the final look one could technically reject H0 and claim efficacy. The type-1 error of this procedure is 0.025 even though this final boundary value is greater than -1.96. This follows from the same reasoning as we provided in Section 61.5.2. The type-1 error is decreased because there is a chance of being absorbed into the futility boundary at an earlier look. To compensate, the critical value of the test statistic for rejecting H0 at the final look is determined to be -1.9264 rather than -1.96. 61.6 Early stopping for futility only 1459 <<< Contents * Index >>> 62 Flexible Stopping Boundaries in East East provides considerable flexibility for generating stopping boundaries with different shapes and varying levels of conservatism for early stopping for efficacy, safety or futility. Suppose, for instance that a trial will be monitored at regular intervals for safety. For ethical reasons, one might wish to choose safety stopping boundaries that possess a very low threshold for early stopping. On the other hand, there might be some reluctance to stopping a trial early for efficacy. If the new treatment looks promising there is often a desire to go to completion and thereby gather overwhelmingly strong evidence of treatment benefit rather than stopping prematurely. In that case, one might wish to choose extremely conservative stopping boundaries with a high threshold for early stopping at the early interim looks. The boundaries that are available in East run the gamut between extreme conservatism and extreme liberality for early stopping. They fall into three main categories: p-value boundaries, power boundaries and spending function boundaries. Furthermore, a boundary may serve either to stop a trial and reject the null hypothesis or to stop a trial and reject the alternative hypothesis. Boundaries that facilitate early stopping to reject the null hypothesis are by far the more common of the two types. They are further classified into efficacy boundaries and safety boundaries. Boundaries that facilitate early stopping to reject the alternative hypothesis are known as futility boundaries. They play a role in early termination of trials in which the treatment effect is too small to confer a therapeutic advantage to the experimental arm. They may be used either in conjunction with, or as an alternative to, conditional power for futility stopping. P-value boundaries are discussed in Section 62.1. Power boundaries are discussed in Section 62.2. As originally conceived of, p-value boundaries offer less flexibility than power boundaries in terms of boundary shape. However, as described in Section 62.1, p-value boundaries have been generalized in this version of East to accommodate many more situations. Still, spending function boundaries offer the most flexibility for trial design. They are discussed in Section 62.3. Our recommendation is to use the spending function boundaries whenever possible. The theory underlying the actual construction of stopping boundaries is developed in Appendix B. The purpose of the present chapter is to document how the various boundaries can be invoked in East and to demonstrate, through examples, the flexibility they confer for trial design. 1460 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 62.1 P-Value (or Haybittle-Peto) Boundaries 62.1.1 Use of HaybittlePeto boundaries 62.1.2 SPARCL trial P-value boundaries, also known as Haybittle-Peto boundaries, have a very simple structure. One specifies a fairly small p-value, say 0.0001, for early stopping at the first K − 1 looks. East then uses recursive integration to compute the last-look p-value needed to achieve an overall type-1 error of α. Historically these boundaries were conceived by Haybittle (1971) as a fairly straightforward way of being permitted to take interim looks without having any substantial impact on the final p-value one would need in order to attain a statistically significant outcome. In East, we have generalized the original Haybittle-Peto boundaries so that the p-values specified at the first K − 1 looks need not be equal. We call such boundaries Generalized Haybittle-Peto boundaries. The following two examples illustrate how to use the original and the generalized Haybittle-Peto boundaries in East. In addition to designing a trial with these types of boundaries, the second example shows how such a trial can be simulated and monitored using East. 62.1.1 Use of Haybittle-Peto boundaries in a hypertension trial A randomized, placebo-controlled trial were conducted to evaluate the efficacy of arthroscopy for osteoarthritis of the knee (Moseley et al., 2002). Primary endpoint was patient-reported pain in the study knee 24 months after intervention on a scale range from 0 to 100, with higher score indicating the more sever pain. Let Xic ∼ N (µc , σ 2 ) be the pain score for the ith subject in the placebo group, Xit ∼ N (µt , σ 2 ) be the pain score for the ith subject in the treatment group, and δ = µt − µc . Null hypothesis was that the patients in the two groups report the same amount of knee pain after two years. That is, H0 : µt = µc . The trial was designed to detect a moderate effect size δ1 = 0.55 with 90% power with a two-sided level-0.04 test. This was the group-sequential design with Haybittle-Peto stopping boundaries of p=0.001 for the interim analyses. For this study, the standard deviation for placebo arm was reported as 18.5 and we will use this as common standard deviation for both the group. We will illustrate designing of this study considering maximum of K=4 equally spaced looks. First, click Continuous: Two Samples on the Design tab, and then click Parallel Design: Difference of Means. The upper pane of this window displays several fields with default values. First, change the Number of Looks to 4. This will add a tab with label Boundary Info. We will come back to this tab later. In the Design Parameters tab, select Superiority for Design Type and 2-Sided for Test Type. Since the study was planned to detect a moderate effect size of 0.55, select Standardized Diff. of Means for Input Method and specify Standardized Diff. ((µt − µc )/σ) as 0.55. Enter 0.04 for Type I Error (α), and 0.9 for Power (1-β). The 62.1 Haybittle-Peto Boundaries – 62.1.1 Use of Haybittle-Peto boundaries 1461 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East Design Parameters tab should appear as below: Click the Boundary Info tab. In this tab, you will see Efficacy and Futility boxes, where you can select efficacy and futility boundary families. Select Haybittle Peto (p-value) for Boundary Family in the Efficacy box and select None for Boundary Family in the Futility box. For the Haybittle Peto boundary family, East allows you to fix either overall type I error or the p-value at the final look. In both the cases, p-values for the interim looks need to be specified. To use the original Haybittile-Peto boundaries, all the interim looks should have equal p-value. Fixed p-value at final look First we will illustrate how to fix the p-value at the final look instead of overall type I error (α). This is the case when one would like to specify a constant p-value boundary at the first 3 looks as well as any desired final p-value boundary for the 4th look. Suppose, for example, that we specify 0.001 at each of the first 3 looks and 0.04 at the 4th look. Select the radio-buttons corresponding to the Last Look p-value, and Unequal p-values at looks. The Boundary Info tab should 1462 62.1 Haybittle-Peto Boundaries – 62.1.1 Use of Haybittle-Peto boundaries <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 appear as below: Click Compute. This will add a row to the Output Preview with label Des 1. The overall type I error is now 0.041, which is slightly higher than the desired type I error of 0.04. The increase in overall power is due to the 3 interim looks. Maximum sample size required for this design is 147. Fixed overall type I error Recall that the study we are considering in this section was designed to maintain an overall type I error of 0.04 with constant Haybittle-Peto boundaries of p=0.001 for the interim analyses. In Boundary Info tab, select the radio-buttons corresponding to the Total Type I Error (α), and Unequal p-values at looks. Then go to the Design Parameters tab and set the Type I error (α) at 0.04. 62.1 Haybittle-Peto Boundaries – 62.1.1 Use of Haybittle-Peto boundaries 1463 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East The Boundary Info tab should appear as below: The p-value corresponding to final look has been updated to 0.0391. Upon clicking the Compute button, we will see that a maximum sample size of 148 would be needed. It is more common to use Haybittle-Peto boundaries as shown in Des 2 – to specify a common p value for the first K − 1 looks, and adjust the final p value to satisfy an overall α. To see the sample size required for a single look design, change the Number of Looks to 1. Click Compute to obtain the fixed sample size. The sample size for Des 2 and that of Des 3, the fixed sample plan, are nearly the same as shown above. This was the original motivation for Haybittle-Peto boundaries. They are easy to specify, permit interim looks with very little chance of stopping the trial, and resemble the fixed sample trial at the final look. 62.1.2 1464 Use of the Generalized Haybittle-Peto boundaries in the SPARCL 62.1 Haybittle-Peto Boundaries – 62.1.2 SPARCL trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 trial The Stroke Prevention by Aggressive Reduction in Cholesterol Levels (SPARCL) group of investigators conducted a large multi-center placebo-controlled trial to evaluate the safety and efficacy of High-Dose Avorstatin after Stroke or Transient Ischemic Attack (TIA) (SPARCL, 2006). The primary hypothesis of the study was that treatment with 80 mg of Avorstatin per day would reduce the risk of fatal or non-fatal stroke among patients with a history of stroke or TIA. The study was designed to have a statistical power of 90% to detect an absolute one third increase in the primary endpoint (time to first fatal or non-fatal stroke) in the Avorstatin group as compared with the placebo group during a median follow-up of five years with a two-sided significance level of 5%. The assumed annual rate in the placebo group was 3.5% or a cumulative survival rate of 96.5%. Seven interim analyses of efficacy were planned with a stopping boundary corresponding to a two-sided significance level of p1 = 0.0001 for the first analysis and pj = 0.001, j = 2, . . . , 7 thereafter. Patients were enrolled between September 1998 and March 2001 for a total of 4200 (implying an accrual rate of 140 patients per month). Trial Design Using the generalized Haybittle-Peto boundaries available in East, we will now design this trial. Start East afresh. Click Survival: Two Samples on the Design tab and then click Parallel Design: Logrank Test Given Accrual Duration and Accrual Rates. Set the Number of Looks to 8, to generate a study with seven interim looks and a final analysis. In the Design Parameters tab, select Design Type as Superiority, Test Type as 2-Sided, and enter Type I Error (α) and Power (1-β) as 0.05 and 0.9, respectively. Leave the # of Hazard Pieces as 1, which implies that hazard rates remain constant overtime in both Avorstatin and placebo groups. Change the Input Method to Cum.% Survival. Tick the check box for Hazard Ratio (Optional), select the radio-button for Hazard Ratio (λt /λc ) and enter 0.75. Finally, the Cum. % Survival (Control) 62.1 Haybittle-Peto Boundaries – 62.1.2 SPARCL trial 1465 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East should be 96.5 at 12 months. The Design Parameters tab should appear as below: Move to the Boundary Info tab. Select Haybittle Peto (p-value) for Boundary Family in Efficacy, and the radio-buttons corresponding to the Total Type I Error (α), and Unequal p-values at looks. Enter the p-value as 0.0001 for the first look, and 0.001 for the next six looks, and click Recalc. The Boundary Info tab should appear as below: The p-value corresponding to the final look has been updated to 0.0488. 1466 62.1 Haybittle-Peto Boundaries – 62.1.2 SPARCL trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Finally, move to the Accrual /Dropout Info tab. Select 1 for # of Accrual Periods, and enter 140 in the Accrual Rate column, and change the Comtd. number of subjects to 4200. Click Compute. Select Des 1 in Output Preview and click the 62.1 Haybittle-Peto Boundaries – 62.1.2 SPARCL trial icon. This will 1467 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East display the design details in the Output Summary. According to this design, 511 events are needed to appropriately power the study. Select Des 1 in Output Summary, click , and select Stopping Boundaries. 1468 62.1 Haybittle-Peto Boundaries – 62.1.2 SPARCL trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The boundaries on the Z-scale are shown below: 62.2 Power Boundaries 62.2.1 Wang-Tsiatis Boundaries 62.2.2 Pampallona-Tsiatis Boundaries East provides two types of power boundaries – Wang-Tsiatis boundaries (Wang and Tsiatis, 1987) for early rejection of H0 , and Pampallona-Tsiatis boundaries (Pampallona and Tsiatis, 1994) for early rejection of H0 or H1 . 62.2.1 Wang-Tsiatis Boundaries The Wang-Tsiatis boundaries permit early stopping to reject H0 . They are used to stop a trial early for efficacy only (1-sided boundaries), safety only (1-sided boundaries) or to stop early either for efficacy or safety (two-sided case). Group sequential boundaries of this type were first proposed by Pocock (1977) and O’Brien and Fleming (1979). Subsequently Wang and Tsiatis (1987) incorporated both the Pocock and O’Brien-Fleming boundaries into a family of “power boundaries” characterized by a shape parameter ∆. For a K-look group sequential trial the power boundary for the standardized test statistic Zj at look j is of the form cj = C(∆, α, K)t∆−0.5 , j = 1, 2, . . . K, j where tj = nj /nmax , nj is the sample size at look j, and nmax is the maximum sample size we must commit up-front to this study in order to achieve the desired power. For technical details on the computation of C(∆, α, K) refer to Appendix B. The study is terminated, and the null hypothesis rejected, the first time that Zj > cj for 62.2 Power Boundaries – 62.2.1 Wang-Tsiatis Boundaries 1469 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East one-sided tests and |Zj | > |cj | for two-sided tests. The constant C(∆, α, K) is computed by recursive integration as described in Appendix F. When ∆ = 0, the stopping boundaries decrease in proportion to the square root of the current information fraction, these are the O’Brien-Fleming boundaries. When ∆ = 0.5, the stopping boundaries are constant at each look, these are the Pocock boundaries. East permits shape parameters in the range −0.5 < ∆ < 0.5. The smaller the value of ∆, the more difficult it is to stop the trial at an interim look. The maximum sample size requirements increase progressively with increasing values of the shape parameter ∆. On the other hand, the expected sample sizes under the alternative hypothesis decrease with increasing values of ∆. Depending on availability of patients and the importance to the trial sponsor of trading off a larger maximum sample size commitment in exchange for a smaller expected sample size, one can select an appropriate value of ∆. 62.2.2 Pampallona-Tsiatis Boundaries The Wang-Tsiatis power boundaries were developed for early stopping to reject H0 . Subsequently Pampallona and Tsiatis (1994) extended these power boundaries to cover the case of early stopping to reject either H0 or H1 . The Pampallona-Tsiatis boundaries are characterized by two shape parameters, ∆1 for the boundaries that facilitate early rejection of H0 and ∆2 for the boundaries that facilitate early rejection of H1 . At the jth look the boundaries for early stopping to reject of H0 are of the form cj = C1 (∆1 , α, β, K) , and the boundaries for early stopping to reject H1 are of the form √ cj = C2 (∆2 , α, β, K) − δ1 nj where 1 − β is the power and δ1 is the treatment effect under H1 . For technical details on the computation of C1 (.) and C2 (.) refer to Appendix B. The one-sided version consists of a pair of boundaries that meet at the last look. In their most common application, one member of the pair facilitates stopping early for efficacy by rejecting H0 and the other member facilitates stopping early for futility by rejecting H1 . The two-sided version consists of a pair of outer boundaries and an inner wedge. Usually one outer boundary is for early stopping to reject H0 in favor of efficacy and the other outer boundary is used for early stopping to reject H0 and conclude that the new treatment is worse than the standard, hence that it is unsafe. If the test statistic enters the inner wedge, the alternative hypothesis H1 is rejected and the trial stops for futility. We shall discuss efficacy, safety and futility stopping boundaries in greater detail in the next section where we introduce the spending function boundaries. 1470 62.2 Power Boundaries – 62.2.2 Pampallona-Tsiatis Boundaries <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 62.3 Spending Function Boundaries The most general way to generate stopping boundaries is through α- and β-spending functions. The idea of using an α-spending function to derive stopping boundaries for early rejection of H0 was first introduced in a landmark paper by Lan and DeMets (1983). Subsequently, Pampallona, Tsiatis and Kim (1995), (2001) developed the notion of a β-spending function to derive stopping boundaries for early rejection of H1 . In East, one may use an α-spending function to generate efficacy, safety or non-inferiority boundaries and a β-spending function to generate futility boundaries. Also one may combine both α- and β-spending in a single trial, with one-sided or two-sided boundaries. All these options are discussed in the sections below. The theory underlying these spending functions is given in Appendix C. 62.3.1 The Alpha Spending Function Suppose the type-1 error of a trial is fixed at α. An α-spending function is any monotone function of the information fraction t ∈ [0, 1], with α(t) = 0 and α(1) = α. The value α(t) may be interpreted as the probability, under H0 , of crossing a stopping boundary by time t; i.e., of committing a type-1 error by time t. Thus one can think of the α-spending function as a way of budgeting how the overall type-1 error is to be spent over the course of the trial. Lan-DeMets Spending Functions A conservative spending function will spend the type-1 error very sparingly in the beginning but will rapidly increase the pace of spending as the trial nears completion. An example of such a spending function, proposed by Lan and DeMets (1983) for two-sided tests, has the functional form zα/4 . (62.1) α(t) = 4 − 4Φ √ t We shall see that this spending function generates stopping boundaries that are very similar to the O’Brien-Fleming boundaries. The function is displayed below. Notice how slowly the α is spent in the early phase of the trial. In East we use the mnemonic LD(OF) to denote this spending function where LD stands for Lan-DeMets and OF stands for O’Brien-Fleming. Lan and DeMets (1983) proposed the following function for spending the type-1 error more aggressively. α(t) = α ln{1 + (e − 1)t} (62.2) This function is displayed below. Notice that it is a concave function. We shall see that this function generates stopping boundaries that closely resemble the Pocock boundaries. In East we use the mnemonic LD(PK) to denote this spending function where LD stands for Lan-DeMets and PK stands for Pocock. 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function 1471 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East At any time t that an interim look is taken, it is possible to invert the corresponding value of the α(t) and thereby generate the stopping boundary. Suppose, for instance, that a study is designed for two interim looks and one final look, at information fractions t1 , t2 and t3 = 1, not necessarily equally spaced. The two-sided symmetric boundary ±c1 at look-1 is obtained as the solution to P0 (|Z(t1 )| ≥ |c1 |) = α(t1 ) Having already utilized α(t1 ) of the total available error to compute c1 , one can generate c2 recursively as the solution to α(t1 ) + P0 (|Z(t1 )| < |c1 |, |Z(t2 )| ≥ |c2 |) = α(t2 ) At the time of the last look, we will have utilized α(t2 ) of the total available error and will know the values of the first two stopping boundaries, c1 and c2 . Thus, the final stopping boundary, c3 , is obtained recursively as the solution to α(t2 ) + P0 (|Z(t1 )| < |c1 |, (|Z(t2 )| < |c2 |, |Z(t3 )| ≥ |c3 |) = α . Notice from the above that the probability of crossing a boundary for the first time at either the first, second or third looks is α(t1 ) + [α(t2 ) − α(t1 )] + [α − α(t2 )] = α (62.3) In other words, this strategy for generating the stopping boundaries is guaranteed to preserve the type-1 error. We will now see how to obtain stopping boundaries in East based on α spending. Suppose we want to generate two-sided stopping boundaries based on three equally spaced looks, derived from the LD(OF) spending function specified by equation (62.1). Start East afresh. Click Continuous: Two Samples on the Design tab, and then click Parallel Design: Difference of Means. Change the Number of Looks from 1 to 3, to generate a study with two interim looks and a final analysis. Accept the default values in Design Parameters tab. Move to the Boundary Info tab. Select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in Efficacy box. In the Futility box, select None for Boundary 1472 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Family. Stopping boundaries will be displayed in the table below in this tab. Click the icon. This will show the boundary chart on the Z scale. The stopping boundaries closely resemble the O’Brien-Fleming boundaries discussed in Section 62.2.1. East allows us to see stopping boundaries on different scales: Select from differen options in the drop-down list under Boundary Scale. Now compare these charts with those from Pocock-like boundaries. Change the 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function 1473 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East Parameter to PK in Boundary Info tab, and click . The stopping boundaries derived from the LD(PK) spending function specified by equation (62.2) closely resemble the Pocock stopping boundaries. Although one usually specifies the number and timing of the interim looks at the design stage, it might not be administratively convenient to adhere to these two design parameters at the interim monitoring stage. The great appeal of the spending function approach for regulatory purposes is that it gives us the freedom to alter both the number and timing of the interim looks while still preserving the overall type-1 error, α. Suppose, for instance that we were to introduce an unplanned interim analysis in between the second and third looks. Thus, suppose that a total of four looks were taken, even though the study was designed for only three looks. Let these looks be taken at times t01 , t02 , t03 , and t04 , where these times need not be the same as any of the three time points t1 , t2 , t3 specified at the design stage. If we use the above recursive method to compute the stopping boundaries c01 , c02 , c03 , and c04 at the four looks, the probability of crossing a stopping boundary must be α(t01 ) + [α(t02 ) − α(t01 )] + [α(t03 ) − α(t02 )] + [α(t04 ) − α(t03 )] = α(t04 ) ≤ α For further details and for a discussion of how to compute sample size for a given power using spending function boundaries, refer to Appendix B. Published Spending Function Families Two single-parameter spending function families are available in East. One such family is the ρ-family (Kim and DeMets, 1987; Jennison and Turnbull, 2000) whose spending functions are given by α(t) = αtρ , ρ > 0 . (62.4) When ρ = 1, the corresponding stopping boundaries resemble the Pocock stopping boundaries. When ρ = 3, the boundaries resemble the O’Brien-Fleming boundaries. Larger values of ρ yield increasingly conservative boundaries. Even greater flexibility is available through γ-family of spending functions (Hwang, Shih and DeCani, 1990) whose spending functions are given by ( −γt ) γ 6= 0 α (1−e (1−e−γ ) , if (62.5) α(t) = αt if γ = 0 . Here negative values of γ yield convex spending functions that increase in conservatism as γ decreases, while positive values of γ yield concave spending 1474 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 functions that increase in aggressiveness as γ increases. The choice γ = 0 spends the type-1 error linearly. The choice γ = −4 produces stopping boundaries that resemble the O’Brien-Fleming boundaries. The choice γ = 1 produces stopping boundaries that resemble the Pocock boundaries. The spending function below was produced with γ = −12. Notice that hardly any error is spent until the study has progressed 80% of the way through. Below we display the 3-look stopping boundary on the standardized Z-statistic scale for a 2-sided design. Go to Test Parameters tab and change the Test Type to s-Sided, Alpha to 0.05. Also change the Spending Function to Gamma (-12) on Boundary tab. Notice that the test statistic must equal ±4.3 standard deviations to stop at the first look and ±3.32 standard deviations at the second look. This might be an appropriate stopping boundary for situations in which it is desirable to take interim looks primarily 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function 1475 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East for safety, but it is not desirable to stop the trial early for efficacy. We stated that large values of γ result in spending functions that spend the error very aggressively. For example if we were to select γ = 4, we would obtain a spending function that is even more aggressive at the first look than the LD(PK) function 1476 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 proposed by Lan and DeMets (1983). The stopping boundaries generated by this spending function are displayed below. These boundaries actually widen over succeeding looks, unlike the Pocock boundaries that stay constant, or the O’Brien-Fleming boundaries that decrease. These might be 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function 1477 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East appropriate boundaries for stopping early for serious adverse events. Interpolated Spending Functions East permits users to specify arbitrary spending functions of their own choosing by defining the amount of α to be spent at various time points and interpolating linearly in between the time points. Interpolated spending functions can be used when it is of interest to use a published spending function and modify it. For instance, some trials use a truncated Lan and DeMets O’Brien-Fleming alpha spending function where the early boundary values are more aggressive than that generated by a regular Lan and DeMets (O’Brien-Fleming) alpha spending function. Suppose we want to take 4 equally spaced looks at the data and use a truncated Lan and DeMets O’Brien-Fleming boundary, which sets the first 2 boundary points close to each other. Go back to Test Parameters tab. Change the Number of Looks to 4. In the Boundary tab, select Spending Functions for Boundary Family, Lan-DeMets for Spending Function and OF for Parameter in Efficacy box. 1478 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Choose Spacing of Looks as Equal. The cumulative α spent in the second look is 0.0031. As we want to spend equal amount of α in the first two looks, the α to be spent in the first look is 0.0031/2 = 0.00165. That is, we are looking for a interpolated spending function with 4 equally spaced looks like below: t 0.25 0.50 0.75 1.0 α(t) 0.00165 0.0031 0.0193 0.05 Change the Spending Functions to Interpolated and enter the values 0.00165, 0.0031 and 0.0193 in the first 3 cells of Cum. α Spent. Click Recalc. 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function 1479 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East To see the stopping boundaries for this modified α-spending function, click . These boundaries as you can observe are more aggressive at the first look than a regular Lan and DeMets O’Brien-Fleming boundary. Spending the α Error Asymmetrically It is sometimes desirable to spend the total type-1 error asymmetrically. Thus, suppose that we wish to split the total type-1 error, α, of a two-sided test into two components αl and αu , with αl + αu = α in such a way that the probability, under H0 , of crossing the upper boundary is αu and the probability, under H0 , of crossing the lower boundary is αl . The algorithm for constructing these asymmetric boundaries is given in Section B.2.4 of Appendix B. We will now illustrate the use of these asymmetric two-sided α-spending function boundaries through an example. The CRASH trial (Lancet, 2004) was a very large multicenter clinical trial to determine the efficacy and safety of administering intravenous corticosteroids to subjects with significant head injury. Subjects with a Glasgow Coma Score of 14 or less were randomized to placebo or corticosteroids. The primary endpoint was death within 14 days. The public health implications of the conclusions from this study were expected to be significant. On the one hand, there was evidence from previous randomized studies that the use of corticosteroids is beneficial. On the other hand, evidence from meta-analysis suggested the possibility of 1480 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 harm. The CRASH trial was intended to settle this issue. A large sample size was needed because any benefit was likely to be small. The risk of death in patients allocated to placebo was expected to be around 15%. Because even a 2% survival difference would be clinically important, the trial had to be large enough to detect a difference of this size. Accordingly, the trial planned to enroll a maximum of 20,000 patients. A sample size this large would be able to detect a 2% benefit with over 90% power while limiting the (two-sided) type-1 error to 0.01. A five-look group sequential design with a Lan-DeMets (O’Brien-Fleming) spending function was adopted since it would be desirable to terminate the trial early if a statistically significant result emerged. First, click Discrete: Two Samples on the Design tab, and then click Parallel Design: Difference of Proportions. Change the Number of Looks to 5. In the Design Parameters tab, select Superiority as Design Type and 2-Sided (Asymmetric) for Test Type. East will ask you to specify the upper and lower α. This is where we can specify that we wish to spend the total type-1 error asymmetrically. Suppose that we split the 0.01 type-1 error into two components each equal to 0.005. This implies that we are equally interested in detecting harm or detecting benefit. Therefore, enter 0.005 for both upper and lower α. Select the radio-button corresponding to Power (1-β) and enter 20000 for Sample Size (n). Specify Prop. under Control (πc ) as 0.15 and Prop. under Treatment (πt ) as 0.13. The Design Parameters tab should appear as below: Click the Boundary Info tab. It is reasonable to suppose that if the corticosteroids are harmful, one would wish to detect this fact early in the trial, and terminate it before half of the 20,000 subjects are randomized to a harmful product. Therefore, one might prefer to spend the available type-1 error aggressively, using, say a Pocock type spending function, for the upper stopping boundary. On the other hand, if the corticosteroids are beneficial, it might be desirable to apply the more conservative O’Brien-Fleming type spending function for the lower stopping boundary so that 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function 1481 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East stronger evidence of benefit is obtained before the trial is terminated. Select Spending Functions for Efficacy Boundary Family. Choose Lan-Demets as Spending Function in both Upper Efficacy Boundary and Lower Efficacy Boundary boxes. For Parameter, select PK and OF in Upper Efficacy Boundary and Lower Efficacy Boundary boxes, respectively. 1482 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Compute. Select Des1 in the Output Preview and click . Although the design requires an up-front commitment of 18,109 patients, if in fact the corticosteroids do reduce the mortality rate by 2%, then the trial is likely to terminate early with an expected sample size of 14246. To see the stopping boundaries, select this Design, click in the Output Summary toolbar and then select Stopping 62.3 Spending Function Boundaries – 62.3.1 The Alpha Spending Function 1483 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East Boundaries. The asymmetry in the lower and upper stopping boundaries ensures that if corticosteroids are harmful, this fact will be detected more quickly than would be the case with symmetric two sided boundaries. 62.3.2 The Beta Spending Function Suppose we wish to design a group sequential trial with α as the type-1 error and β as the type-2 error; i.e., with 1 − β as the power. Just as we can use an α-spending function to generate efficacy boundaries, we can use a β-spending function to generate futility boundaries, or boundaries for early stopping in favor of the null hypothesis. The idea of designing trials with futility boundaries was developed by Pampallona and Tsiatis (1994). The further idea of using β-spending functions to create such boundaries both at the design and interim monitoring stages was developed by Pampallona, Tsiatis and Kim (2001). These boundaries are crossed with probability β under the alternative hypothesis. Moreover, the probability of crossing these boundaries increases as the treatment effect decreases towards the null hypothesis until, at the null hypothesis itself, the probability of crossing is 1 − α. Futility boundaries may be used either by themselves or in conjunction with efficacy boundaries. When an efficacy boundary and a futility boundary are both present in the same study, they are forced to meet at the last look, so that either H0 is rejected or H1 is rejected by the end of the study. Refer to Appendix B, Section B.2.4 for the technical details concerning the use of β-spending functions and the construction of futility boundaries. 1484 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Trials with Early Stopping for Efficacy or Futility Consider a hypothetical two-arm hypertension clinical trial in which Xic ∼ N (µc , 1) is the blood pressure reduction for the ith subject in the control group, Xit ∼ N (µt , 1) is the blood pressure reduction for the ith subject in the treatment group, and δ = µt − µc . The trial should have 90% power to detect δ1 = 0.3 using a maximum of K=5 equally spaced looks, and we will assume that all measurements are made on a standardized scale so that σ 2 = 1. We wish to construct a one-sided level-0.025 test with both an efficacy and a futility boundary. These boundaries should be such that if H1 is true (δ = δ1 = 0.3) the upper efficacy boundary will be crossed with probability 0.9, whereas if H0 is true (δ = 0), the lower futility boundary will be crossed with probability 1 − 0.025 = 0.975. The efficacy boundary is generated by specifying an α-spending function. The futility boundary is generated by specifying a β-spending function. Start East afresh. First, click Continuous: Two Samples on the Design tab, and then click Parallel Design: Difference of Means. Change the Number of Looks to 5. In the Design Parameters tab, select Superiority as Design Type and 1-Sided as Test Type. Select Difference of Means for Input Method and specify Difference in Means (µt − µc ) as 0.3. Enter 1 for Std. Deviation (σ). Enter values for Type I Error (α) and Power (1-β) as 0.025 and 0.9, respectively. The Design Parameters tab should appear as below: Click the Boundary Info tab. In this tab, we must specify both the α- and β-spending functions. Select Spending Functions for Boundary Family in both Efficacy and Futility boxes. The next field asks you choose the type of spending function. There is complete flexibility to select any member of any of the four available spending function families (Rho Family, Gamma Family, Lan-DeMets Family, Power Family) for spending α and independently for spending β. Suppose we decide that we will use the Gm(-8) spending function for spending α and the Gm(-4) spending function for spending the β. This might be a good choice, for instance, if the sponsor wants to set a very high hurdle for early stopping for efficacy, 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function 1485 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East but wants to have a reasonable chance of pulling out early if the trial is going nowhere. Select Gamma Family as a type of Spending Function in both Efficacy and Futility boxes. Specify Parameter (γ) as −8 and −4 for efficacy and futility, respectively. Notice that in the Futility box you are given a further choice between Binding and Non Binding radio-buttons. The default selection is Non Binding and implies that the futility boundary will be constructed in such a way that it can be overruled if desired without inflating the type-1 error. This flexibility is important, since the sponsor or the data monitoring committee might well prefer to keep the trial going to gather additional information, despite crossing the futility boundary. A Binding futility boundary is generally not recommended. It interacts with the corresponding efficacy boundary in such a way that unless it is strictly enforced (i.e., unless the trial is terminated if the futility boundary is crossed) the type-1 error might be inflated. Thus, for the present, select the default Non Binding radio button. We will compare the operating characteristics of binding and non binding futility boundaries at the end of the present section. A more detailed technical discussion is available in Appendix B, Section B.2.4. The Boundary Info tab will look as shown below: Note - In the Spacing of Looks table of the Boundary Info tab, notice that there are ticked checkboxes under the columns Stop for Efficacy and Stop for Futility. East gives you the flexibility to remove one of the stopping boundaries at certain looks, subject to the following constraints: (1) both boundaries must be included at the final two looks, (2) at least one boundary, either efficacy or futility, must be present at each look, (3) once a boundary has been selected all subsequent looks must include this boundary as well and (4) efficacy boundary for the penultimate look cannot be absent. Click Compute. Select Des 1 by clicking anywhere along the row in the Output Preview and click the 1486 icon to save this design in the Library. Select Des 1 in 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Preview or in Library and click the details in the Output Summary. To see the spending functions, click on the toolbar and then select Error Spending. icon. This will display the design icon from the Output Summary 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function 1487 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East Notice how much slower the Gm(-8) function spends the α error than the Gm(-4) function spends the β error. Close the spending function chart and select Stopping Boundaries after clicking on the icon. An important feature of the stopping boundaries in Des 1 is that they meet at the final look. East forces this property on all H0 or H1 boundaries. The computational details are given in Appendix B. By forcing the boundaries to meet, one is guaranteed to decide to either reject H0 or reject H1 . There is no area of indecision. This leads to a slight increase in the maximum sample size relative to a boundary corresponding to H0 rejection. For comparison purposes, create a new design by right-clicking Des 1 in the Library, and clicking icon. Go to the Boundary Info tab and change the Boundary Family to None in the Futility box. Click Compute. Select both Des 1 1488 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and Des 2 in the Output Preview and click . For a very small increase in the up-front sample size commitment, Des 1 produces about the same saving in expected sample size as Des 2 if δ = 0.3 and a considerably larger saving if δ = 0. Moreover, as stated earlier, the futility boundary of Des 1 is non-binding; it can be overruled whenever desired without causing the type-1 error to exceed α, and without decreasing the power. Thus, all in all, Des 1 would appear to be superior to Des 2. Futility boundaries derived from β-spending functions were introduced initially by Pampallona, Tsiatis and Kim (1995), (2001). The boundaries proposed in those papers had the serious drawback of being mandatory or binding. They interacted with the corresponding efficacy boundaries in such a way that one could not overrule them without the risk of inflating the type-1 error. For this reason, they were not very practical. Data monitoring committees (DMCs) prefer to use group sequential boundaries as guidance rather than as mandatory stopping rules. Efficacy boundaries pose no difficulty in this regard. If an efficacy boundary is crossed but the DMC votes nevertheless to keep the trial going to gain some additional information (on a secondary endpoint, say), there might be some loss of power, but no there is no risk of inflating the type-1 error. Futility boundaries, as derived by Pampallona, Tsiatis and Kim (2001) are a different matter. They cannot be overruled without the risk of inflating the type-1 error. The modification to these boundaries that we have proposed 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function 1489 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East in Appendix B, Section B.2.4 overcomes this difficulty. To compare non-binding and binding futility boundaries, create a new design by right-clicking Des 1 in the Output Preview, and clicking icon. Go to the Boundary Info tab, and select the radio button corresponding to Binding in the Futility box, and click Compute. Select both Des 1 and Des 3 and click . Des 3 is very similar to Des 1 in terms of maximum and expected sample sizes. The two designs differ in one important respect, however. The upper efficacy boundary of Des 3 is different from the upper efficacy boundary of Des 1, whereas the upper efficacy boundary of Des 1 is identical to the upper efficacy boundary of Des 2. Thus, the attained α for Des 1 is slightly lower than the specified α: the futility boundary will capture a small proportion of trials that would otherwise have crossed the efficacy boundary as type-1 errors. Trials with Early Stopping for Futility Only Let us consider, once again, the hypertension clinical trial introduced at the beginning of the ongoing subsection. Suppose the trial is designed for a test of H0 : µt − µc = 0 at one-sided significance level α = 0.025 and 90% power at the alternative hypothesis H1 : µt − µc = 0.3 with an assumed variance σ 2 = 1. There will be five equally spaced looks at the data with a futility boundary for terminating the trial early with the declaration that H0 cannot be rejected. The futility boundary is required to have the 1490 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 property that the overall boundary crossing probability under H1 is 0.1. There is no intention to stop the trial early for efficacy. Create a new design by right-clicking Des 1 in the Library, and clicking icon. Go to the Boundary Info tab. In the Efficacy box, change the Boundary Family to None from the drop-down list. In the Futility box, set the Boundary Family to Spending Function and select Gamma Family in the ensuing field. Type in −4 as the value of Parameter (γ). Select the radio-button corresponding to Binding. Click Compute to obtain size for this ‘Futility only’ design. East will create this design with label Des 4. A summary of Des 4 and the associated β spending function are displayed below. 62.3 Spending Function Boundaries – 62.3.2 The Beta Spending Function 1491 <<< Contents 62 * Index >>> Flexible Stopping Boundaries in East Edit Des 4 to create a corresponding a Single-look study (Des 5), designed for the same effect size, type-1 error and power. In Des 5, we are forced to continue until the maximum sample size is reached, unless it is terminated due to low conditional power. We have pointed out in Chapter 61 that use of low conditional power to terminate a trial early is rather ad hoc, and gives us no assurance that the overall unconditional power of the study will be preserved. Des 5 requires a commitment of 467 patients. However, there is no option under Des 5 to stop the trial early if the effect size is smaller than was anticipated at the design stage. In contrast Des 4 requires an up-front committment of 475 patients, five more than Des 5. But this is a small price to pay for the flexibility to take interim looks and stop early if the futility boundary is crossed. The expected sample size of Des 1 is 289 patients if H0 is true. 1492 62.3 Spending Function Boundaries <<< Contents * Index >>> 63 Confidence Interval Based Design During the design of an experiment such as a clinical trial, when researchers consider a hypothesis test for a parameter of interest, say δ, either the unknown sample size for the desired power or the unknown power for a fixed sample size must be determined. A confidence interval based design calculates the sample size based on the desired width of a confidence interval for the parameter of interest rather than the power of the hypothesis test. In previous versions of East, a user could employ a confidence interval based approach only via a labor intensive process of trial and error by generating repeated confidence interval charts. East now allows the computation of such a sample size for many single look designs based on analytical methods without the need to use such charts. The result is a quick and efficient way to compute the sample size required to achieve a desired width for a confidence interval for δ, given the confidence level 1 − α. Definitions 1 − α denotes the confidence level ω is the measure of precision for δ (width of confidence interval) δ̂ is the empirical estimate of δ The estimated sample size n must satisfy the following: For a two-sided confidence interval P (δ̂ − ω ≤ δ ≤ δ̂ + ω) = 1 − α For a one-sided confidence interval P (δ ≥ δ̂ − ω) = 1 − α or P (δ ≤ δ̂ + ω) = 1 − α 63.1 One Sample Test for a Single Mean for Continuous Data Consider the problem of comparing the mean of the distribution of observations from a single random sample of continuous data to a specified constant. Suppose it is required to estimate the sample size for obtaining a 95% two-sided confidence interval for the population mean with a precision of 5 units, when the population standard deviation is known to be 20 units. 63.1 One Sample Test for a Single Mean for Continuous Data 1493 <<< Contents 63 * Index >>> Confidence Interval Based Design To illustrate this example, in East under the Design ribbon for Continuous data, click One Sample and then click Single Arm Design: Single Mean as shown: This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box and enter the following design parameters: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): Computed (select radio button) Half Width (ω): 5.0 Standard Deviation (σ): 20 1494 63.1 One Sample Test for a Single Mean for Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Confidence Interval based design for this particular test also allows the user to specify whether or not a Finite Population Correction for a fixed Population Size is used. In addition, the user can also determine if a Coverage Correction is to be used for a given Coverage Probability. This coverage correction may become necessary when the population standard deviation is unknown and is to be estimated from the sample. For now leave these boxes unchecked and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design summary will be displayed labeled Output 63.1 One Sample Test for a Single Mean for Continuous Data 1495 <<< Contents 63 * Index >>> Confidence Interval Based Design Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a One-sided confidence interval and with various values for ω and σ, as well as any desired differences in Population Size and Coverage Probability. Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 100. Enter the following in the Design Input screen and click Compute: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): 100 Half Width (ω): Computed (select radio button) Standard Deviation (σ): 20 1496 63.1 One Sample Test for a Single Mean for Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The precision parameter ω is calculated to be 4.1. As the sample size is increased the resulting estimate of precision increases, which is to say the precision limit decreases, providing a tighter confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. 63.2 One Sample Test for the Mean of Paired Differences for Continuous Data Consider the problem of comparing the means of two normal distributions when each observation in the random sample from one distribution is matched with a unique observation from the other distribution. Suppose it is required to estimate the sample size for obtaining a 99% two-sided confidence interval for the difference of means with a precision of 1.0 units, when the population standard deviation is known to be 3.4 units. To illustrate this example, in East under the Design ribbon for Continuous data, click 63.2 One Sample Test for the Mean of Paired Differences for Continuous Data 1497 <<< Contents 63 * Index >>> Confidence Interval Based Design One Sample and then click Paired Design: Mean of Paired Differences as shown: This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box and enter the following design parameters: Test Type: 2 sided Confidence Level (1 − α): 0.99 Sample Size (n): Computed (select radio button) Half Width (ω): 1.0 Standard deviation of Paired Difference(σD ): 3.4 1498 63.2 One Sample Test for the Mean of Paired Differences for Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Confidence Interval based design for this particular test also allows the user to specify whether or not a Finite Population Correction for a fixed Population Size is used. In addition, the user can also determine if a Coverage Correction is to be used for a given Coverage Probability. This coverage correction may become necessary when the population standard deviation is unknown and is to be estimated from the sample. For now leave these boxes unchecked and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 63.2 One Sample Test for the Mean of Paired Differences for Continuous Data 1499 <<< Contents 63 * Index >>> Confidence Interval Based Design Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a One-sided confidence interval and with various values for ω and σD , as well as any desired differences in Population Size and Coverage Probability. Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 100. Enter the following in the Design Input screen and click Compute: Test Type: 2 sided Confidence Level (1 − α): 0.99 Sample Size (n): 100 Half Width (ω): Computed (select radio button) Standard deviation of Paired Difference(σD ): 3.4 1500 63.2 One Sample Test for the Mean of Paired Differences for Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The precision parameter ω is calculated to be 0.876. As the sample size is increased the resulting estimate of precision increases, which is to say the precision limit decreases, providing a tighter confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. From the Library choose Sample Size vs. Width from the 63.2 One Sample Test for the Mean of Paired Differences for Continuous Data 1501 <<< Contents 63 * Index >>> Confidence Interval Based Design Plots menu. Here, the user can move the cursor horizontally back and forth to change the interval width and immediately view the resulting sample size. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 1502 63.2 One Sample Test for the Mean of Paired Differences for Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 63.3 Two Sample Test for the Difference of Means for Continuous Data Consider the problem of comparing a new treatment to a standard protocol. It is often necessary to randomize subjects to the control and treatment arms, and then determine if the group-dependent means of the outcome variables are significantly different. The following example illustrates a confidence interval based design for such a trial when the outcomes from both groups follow a normal distribution. Suppose it is required to estimate the sample size for obtaining a 95% two-sided confidence interval for the difference of two means with a precision of 3.0 units. Assume that the common standard deviation of the observations is 8. In East under the Design ribbon for Continuous data, click Two Sample and then click Parallel Design: Difference of Means as shown: This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box. Consider a one sided test with 5% significance level, and an Allocation Ratio (nt : nc ) of 3:1, that is, 75% of the patients are randomized to the treatment arm. Enter the following design 63.3 Two Sample Test for the Difference of Means for Continuous Data 1503 <<< Contents 63 * Index >>> Confidence Interval Based Design parameters: Test Type: 1 sided Confidence Level (1 − α): 0.95 Sample Size (n): Computed (select radio button) Allocation Ratio: 3 One-sided Width (ω): 3.0 Standard Deviation (σ): 8 The Confidence Interval based design for this particular test also allows the user to specify whether or not a Coverage Correction is to be used for a given Coverage Probability. This coverage correction may become necessary when the population standard deviation is unknown and is to be estimated from the sample. For now leave this box unchecked and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 1504 63.3 Two Sample Test for the Difference of Means for Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a Two-sided confidence interval and with various values for Allocation Ratio, ω and σ, as well as any desired differences in Coverage Probability. Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 80. Enter the following in the Design Input screen and click Compute: Test Type: 1 sided Confidence Level (1 − α): 0.95 Sample Size (n): 80 Allocation Ratio: 3 One-sided Width (ω): Computed (select radio button) Standard Deviation (σ): 8 63.3 Two Sample Test for the Difference of Means for Continuous Data 1505 <<< Contents 63 * Index >>> Confidence Interval Based Design The precision parameter ω is calculated to be 3.398. As the sample size is decreased, the resulting value of ω increases. In other words, the precision limit increases, resulting in a wider confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From icon. East there, a summary of the design can be generated using the details also provides a very useful Sample Size vs. Width plot. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. From the Library choose Sample Size vs. Width from the Plots 1506 63.3 Two Sample Test for the Difference of Means for Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 menu. Here, the user can move the cursor horizontally back and forth to change the interval width and immediately view the resulting sample size. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.3 Two Sample Test for the Difference of Means for Continuous Data 1507 <<< Contents 63 63.4 * Index >>> Confidence Interval Based Design One Sample Test for a Single Binomial Proportion Consider the experimental situation in which an observed treatment response rate is compared to a fixed response rate derived from historical data, where the variable of interest has a binomial distribution. It is therefore of interest to determine whether the response rate π differs from a fixed value π0 . The following example illustrates a confidence interval based design for a one arm trial having a binomial response rate, where a single binomial proportion is tested against a fixed value. Suppose it is required to estimate the sample size to obtain a 95% two-sided confidence interval for π with a precision of 0.01 units. The sample size is determined for a specified value of π which is consistent with the alternative hypothesis, denoted π1 . The design is a single-arm trial in which we wish to determine if the response rate of a new therapy is at least 15%. Thus, it is desired to test the null hypothesis H0 : π = 0.15 against the one-sided alternative hypothesis H1 : π > 0.15. Assume π = π1 = 0.25 and a type one error rate of 0.05. In East under the Design ribbon for Discrete data, click One Sample and then click Single Arm Design: Single Proportion as shown: 1508 63.4 One Sample Test for a Single Binomial Proportion <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following Design Input window: Choose Confidence Interval in the Design Type dropdown box. Consider a one sided test with 5% significance level and fixed value of π = 0.25. Enter the following design parameters: Test Type: 1 sided Confidence Level (1 − α): 0.95 Sample Size (n): Computed (select radio button) One-sided Width (ω): 0.01 Proportion (π): 0.25 The Confidence Interval based design for this particular test also allows the user to specify whether or not a Finite population Correction is to be used for a given Population Size. For now leave this box unchecked and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output 63.4 One Sample Test for a Single Binomial Proportion 1509 <<< Contents 63 * Index >>> Confidence Interval Based Design Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a Two-sided confidence interval and with various values for ω and π, as well as any desired differences in Finite Population Correction. 1510 63.4 One Sample Test for a Single Binomial Proportion <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 4000 with a finite population correction of size 8000. Enter the following in the Design Input screen and click Compute: Test Type: 1 sided Confidence Level (1 − α): 0.95 Sample Size (n): 4000 One-sided Width (ω): Computed (select radio button) Proportion (π): 0.25 Finite Population Correction: box checked) Population Size: 8000 For a sample size of 4000 the precision parameter ω is calculated to be 0.008. As the sample size is decreased, the resulting value of ω decreases. For binomial data, this results in a wider confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the 63.4 One Sample Test for a Single Binomial Proportion 1511 <<< Contents 63 * Index >>> Confidence Interval Based Design confidence interval. From the Library choose Sample Size vs. Width from the Plots menu. Here, the user can move the cursor horizontally back and forth to change the interval width and immediately view the resulting sample size. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of 1512 63.4 One Sample Test for a Single Binomial Proportion <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 values to generate multiple confidence intervals and the corresponding sample sizes. 63.5 Two Sample Test for the Difference of Binomial Proportions In medical research, outcomes dealing with the proportion of patients responding to a therapy, developing a certain side effect or requiring specialized care, are experiments based on binomial data designs. In these situations the goal is to compare independent samples from two populations in terms of the proportion of patients presenting the characteristic or outcome. East supports a Confidence Interval based approach to the design of clinical trials, independent of the power of the test, in which treatment comparison is based on the difference of such proportions. For example, in a prospective randomized trial of placebo versus treatment for patients with a heart condition, the endpoint may be reduction in death or MI within a certain period of time after entering the study. It is of interest to detect a reduction in the event rate from 15% on the placebo arm to 10% on the treatment arm. In other words the goal is to test the null hypothesis that the treatment and placebo arms both have an event rate of 15%, versus the alternative that the treatment reduces the event rate by 5% (from 15% to 10%). Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let δ = πt − πc . The interest is therefore in testing the null hypothesis H0 : δ = 0, for a two-sided test with a type-1 error of 5%. Consider a confidence interval based design to estimate the sample size with a precision of ω = 0.05. In East under the Design ribbon for Discrete data, click Two Samples and then click Parallel Design: Difference of Proportions as shown: 63.5 Two Sample Test for the Difference of Binomial Proportions 1513 <<< Contents 63 * Index >>> Confidence Interval Based Design This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box and enter the following design parameters: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Prop. under Control (πc ): 0.15 Prop. under Treatment (πt ): 0.10 Diff. in Prop. (δ1 = πt − πc ): -0.05 (this will be calculated) Half Width (ω): 0.05 Specify Variance: Select Unpooled Estimate radio button 1514 63.5 Two Sample Test for the Difference of Binomial Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. In binomial designs, the variance of a random variable is dependent on its mean. The maximum sample size required for a study will be affected by how the differences of binomial response rates are standardized when computing the test statistic, regardless of the other design parameters. There are two options for determining how the test statistic will be standardized, using either the Unpooled or Pooled specification for variance. The difference becomes important when planning a binomial study with unbalanced randomization. In this case, both pooled and unpooled designs should be considered and the one that produces a tighter confidence interval (measure of ω) with fewer patients should be chosen. This will depend on the response rates of the control and treatment arms as well as the value of the fraction assigned to the treatment arm. More information on this can be found in Section 23.1. For this example, keep the default settings (Allocation Ratio = 1 and Unpooled Estimate selected) and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 63.5 Two Sample Test for the Difference of Binomial Proportions 1515 <<< Contents 63 * Index >>> Confidence Interval Based Design Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview icon. This test can easily be repeated for a One-sided window and clicking the confidence interval and with various values for ω, proportions of responses for treatment and control groups (πt and πc ), and different specifications for variance estimates. Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 500. Enter the following in the Design Input screen and click Compute: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): 500 Allocation Ratio (nt /nc ): 1 Prop. under Control (πc ): 0.15 Prop. under Treatment (πt ): 0.10 Diff. in Prop. (δ1 = πt − πc ): -0.05 (this will be calculated) Half Width (ω): Computed (select radio button) Specify Variance: Select Unpooled Estimate radio button 1516 63.5 Two Sample Test for the Difference of Binomial Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For a sample size of 500 the precision parameter ω is calculated to be 0.058. As the sample size is decreased, the resulting value of ω slightly increases. For binomial data, this results in a wider confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot, found in the plots menu. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.6 Two Sample Test for the Ratio of Binomial Proportions In experiments based on binomial data, independent samples from different populations are compared in terms of the proportion of participants presenting a particular trait or outcome of interest. For example, outcomes such as the proportion of 63.6 Two Sample Test for the Ratio of Binomial Proportions 1517 <<< Contents 63 * Index >>> Confidence Interval Based Design patients responding to a treatment, developing an adverse reaction, or requiring specialized care could be of interest in medical research. East supports a Confidence Interval based approach to the design of clinical trials in which this comparison is based on the ratio of proportions. For example, consider a prospective randomized trial of a standard treatment (control arm) versus a new combination treatment (therapy arm) for patients with a heart condition, where the endpoint is either death or MI within a certain period of time after randomization. Suppose it is of interest to determine the sample size required for a trial to detect a 25% decline in the rate of such outcomes. It can be assumed that the control arm has a 30% event rate. Let πc and πt denote the binomial probabilities for the control and treatment arms, respectively, and let ρ = πt /πc . Under H0 , πt = πc = 0.3. A 25% decline in the event rate is thus ρ = πt /πc = 0.75. It is of interest to test the null hypothesis that ρ = 1 against one or two-sided alternatives. When dealing with ratios, it is mathematically more convenient to express this hypothesis in terms of the difference of the (natural) logarithms. Defining δ = ln(πt ) − ln(πc ) leads to the equivalent of testing H0 : δ = 0. More information on this design can be found in Section 23.2 Consider a confidence interval based design of a two-arm study that compares the control arm to the combination therapy arm, where the sample size required for obtaining a 95% two-sided confidence interval for the ratio of proportions with a precision (width) of ω = 0.35 must be determined. In East under the Design ribbon for Discrete data, click Two Samples and then click Parallel Design: Ratio of Proportions as shown: 1518 63.6 Two Sample Test for the Ratio of Binomial Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box and enter the following design parameters: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Prop. under Control (πc ): 0.3 Ratio of Proportions (ρ1 = πt /πc ): 0.75 Prop. under Treatment (πt ): 0.225(this will be calculated) Half-Width (ω): 0.35 Variance of Standardized Test Statistic: Select Unpooled Estimate radio button 63.6 Two Sample Test for the Ratio of Binomial Proportions 1519 <<< Contents 63 * Index >>> Confidence Interval Based Design The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. In binomial designs, the variance of a random variable is dependent on its mean. The maximum sample size required for a study will be affected by how the differences of binomial response rates are standardized when computing the test statistic, regardless of the other design parameters. There are two options for determining how the test statistic will be standardized, using either the Unpooled or Pooled specification for variance. The difference becomes important when planning a binomial study with unbalanced randomization. In this case, both pooled and unpooled designs should be considered and the one that produces a tighter confidence interval (measure of ω) with fewer patients should be chosen. This will depend on the response rates of the control and treatment arms as well as the value of the fraction assigned to the treatment arm. More information on this can be found in Section 23.1. For this example, keep the default settings (Allocation Ratio = 1 and Unpooled Estimate selected) and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 1520 63.6 Two Sample Test for the Ratio of Binomial Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a One-sided confidence interval and with various values for ω, proportions of responses for treatment and control groups (πt and πc ), and different specifications for variance estimates. Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 500. Enter the following in the Design Input screen and click Compute: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): 500 Allocation Ratio (nt /nc ): 1 Prop. under Control (πc ): 0.3 Prop. under Treatment (πt ): 0.225(this will be calculated) Ratio of Proportions (ρ1 = πt /πc ): 0.75 Half-Width (ω): Computed (select radio button) Variance of Standardized Test Statistic: Select Unpooled Estimate radio button 63.6 Two Sample Test for the Ratio of Binomial Proportions 1521 <<< Contents 63 * Index >>> Confidence Interval Based Design For a sample size of 500 the precision parameter ω is calculated to be 0.298. As the sample size is increased, the resulting value of ω slightly decreases. For binomial data, this results in a tighter confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot, found in the plots menu. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.7 1522 Two Sample Test for the Odds Ratio of Proportions It is often of interest to compare two independent samples from different populations in terms of the proportion of participants presenting a particular response. For example, outcomes such as the proportion of patients responding to a therapy, developing a certain side effect, or requiring specialized care are common in clinical research. East supports a Confidence Interval based approach to the design of clinical 63.7 Two Sample Test for the Odds Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 trials for such experiments based on binomial data, in which the relationship between the odds ratio of the two populations is to be investigated. For example, consider a prospective randomized trial where the hope is that a new experimental treatment can triple the odds ratio of exhibiting a positive outcome. The standard treatment (control arm) is compared to the new treatment (therapy arm). Suppose the goal is for the 10% response rate of the standard treatment (control) to increase to 25% for the new therapy arm. Let πt and πc denote the two binomial probabilities associated with the treatment and the control, respectively. The odds ratio is defined as: ψ= πt /(1πt ) πt (1πc ) = . πc /(1πc ) πc (1πt ) (63.1) The problem reduces to testing H0 : ψ = 1 against the two-sided alternative H1 : ψ 6= 1 or against a one-sided alternative H1 : ψ < 1 or H1 : ψ > 1. Similar to tests dealing with the ratio of proportions, it is mathematically convenient to express the hypothesis testing of odds ratios in terms of the (natural) logarithm of ψ. Information regarding the specific details of parameter estimation for this test can be found in section 23.3 Consider a confidence interval based design for a study that compares the odds ratio of proportions between the control and experimental therapy arms. Use a two-sided test to determine the sample size required given πc = 0.1 and ψ1 = 3 with a precision parameter (width) of ω =0.35. In East under the Design ribbon for Discrete data, click Two Samples and then click Parallel Design: Odds Ratio of Proportions as shown: 63.7 Two Sample Test for the Odds Ratio of Proportions 1523 <<< Contents 63 * Index >>> Confidence Interval Based Design This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box and enter the following design parameters: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Prop. under Control (πc ): 0.1 Prop. under Treatment (πt ): 0.25(this will be calculated) Odds Ratio of Proportions (ψ1 ): 3 Half Width (ω): 0.5 The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, 1524 63.7 Two Sample Test for the Odds Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. For this example, keep the default Allocation Ratio = 1 and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a One-sided confidence interval and with various values for ω, different proportions of responses for treatment and control groups (πt and πc ), and desired odds ratios of proportions (ψ1 ). Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the 63.7 Two Sample Test for the Odds Ratio of Proportions 1525 <<< Contents 63 * Index >>> Confidence Interval Based Design user has to only enter the desired value, for example n = 300. Enter the following in the Design Input screen and click Compute: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): 300 Allocation Ratio (nt /nc ): 1 Prop. under Control (πc ): 0.1 Prop. under Treatment (πt ): 0.25(this will be calculated) Odds Ratio of Proportions (ψ1 ): 3 Half Width (ω): Computed (select radio button) For a sample size of 300 the precision parameter ω is calculated to be 0.649. As the sample size is decreased, the resulting value of ω increases. For binomial data, this results in a wider confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot, found in the plots menu. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. Width values can be generated using the Tables 1526 menu, also found in the 63.7 Two Sample Test for the Odds Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.8 One Sample Test for McNemar’s Test for Comparing Matched Pairs Often two binary response measurements are made on each subject, from either two different treatments or from two different time points. For example, in a comparative clinical trial, subjects are matched on baseline demographics and disease characteristics and then randomized with one subject in the pair receiving the experimental treatment and the other subject receiving the control. Another example is the cross over clinical trial in which each subject receives both treatments. By random assignment, some subjects receive the experimental treatment followed by the control while others receive the control followed by the experimental treatment. McNemar’s Test is used in experimental situations where such paired comparisons are observed. More specific theoretical detail about this method with examples can be found in section 22.2 The probability parameters for McNemar’s test are displayed in the following table where πc and πt denote the response probabilities for the control and experimental treatments, respectively. Table 63.1: A 2 x 2 Table of Probabilities for McNemar’s Test Control No Response Response Total Probability Experimental No Response Response π00 π01 π10 π11 1 − πt πt Total Probability 1 − πc πc 1 The following example taken from Section 22.2 illustrates how a confidence interval based approach to the trial design can be applied to McNemar’s test for comparing matched pairs of binomial responses. Consider a trial in which we wish to determine whether a transdermal delivery system (TDS) can be improved with a new adhesive. Subjects are to wear the old TDS (control) and new TDS (experimental) in the same area of the body for one week each. A response is said to occur if the TDS remains on for the entire one week observation period. From historical data, it is known that control has a response rate of 85% (πc = 0.85). It is hoped that the new adhesive will increase this to 95% (πt = 0.95). Furthermore, of the 15% of the subjects who did not respond on the control, it is hoped that 87% will respond on the experimental system. 63.8 One Sample Test for McNemar’s Test for Comparing Matched Pairs 1527 <<< Contents 63 * Index >>> Confidence Interval Based Design That is, π01 = 0.87 × 0.15 = 0.13. Based on these data, we can fill in all the entries of Table 63.1 as follows: Table 63.2: McNemar Probabilities for the TDS Trial Control No Response Response Total Probability Experimental No Response Response 0.02 0.13 0.03 0.82 0.05 0.95 Total Probability 0.15 0.85 1 Although it is expected that the new adhesive will increase the adherence rate, the comparison is posed as a two-sided testing problem, testing H0 : πc = πt against H1 : πc 6= πt at the 0.05 level. We wish to determine the sample size for the values displayed in the above table using a Confidence Interval based design. In East under the Design ribbon for Discrete data, click One Sample and then click Paired Design: McNemar’s as shown: 1528 63.8 One Sample Test for McNemar’s Test for Comparing Matched Pairs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box. Consider a two sided test with 5% significance level and specify δ1 = πt − πc = 0.1 and ξ = π01 + π10 = 0.16 with a precision (width) of 0.5 units. Enter the following design parameters: Test Type: 2 sided Confidence Level (1 − α): 0.95 Sample Size (n): Computed (select radio button) Half Width (ω): 0.5 Difference in Probabilities (δ1 ): 0.1 Proportion of Discordant Pairs ξ: 0.16 63.8 One Sample Test for McNemar’s Test for Comparing Matched Pairs 1529 <<< Contents 63 * Index >>> Confidence Interval Based Design Click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the icon in the output of this design, click anywhere in the row and then click the Output Preview toolbar. The design details will be displayed, labeled Output 1530 63.8 One Sample Test for McNemar’s Test for Comparing Matched Pairs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the generated using the details icon. From there, a summary of the design can be icon. East also provides a very useful Sample Size menu. This dynamic visual can immediately vs. Width plot, found in the plots assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. This test can easily be repeated for a one-sided confidence interval and with various values for ω and difference in probabilities (δ1 ) or proportion of discordant pairs (ξ). East can also compute the precision level ω for a given fixed sample size using a confidence interval based design for McNemar’s test. Following the example above, the precision of the estimate (ω) of population parameter an easily be determined. 63.9 Many Sample Test One Way ANOVA East offers the capability to design trials comparing more than two continuous means. A One-Way ANOVA tests the equality of means across R independent groups. The two sample difference of means test for independent data is a one-way ANOVA test for 2 groups. More information, including the following example which is modified here to illustrate a confidence interval based approach to the trial design, can be found in Section 22.2. 63.9 Many Sample Test - One Way ANOVA 1531 <<< Contents 63 * Index >>> Confidence Interval Based Design Suppose n patients have been allocated randomly to R treatments. We assume that the data of the R treatment groups comes from R normally distributed populations with the same variance σ 2 , and with population means µ1 , µ2 , . . . , µR . The null hypothesis H0 : µ1 = µ2 = . . . = µR is tested against the alternative hypothesis H1 : for at least one pair (i, j), µi 6= µj , where i, j = 1, 2, . . . R. Consider a clinical trial with four groups of patients where the goal is to study the efficacy of a treatment protocol. Three different doses of a drug are being compared against placebo in patients with Alzheimer’s disease. Suppose, based on historical data, the expected mean responses are 0, 1.5, 2.5, and 2, for Groups 1 to 4, respectively. The common standard deviation within each group is σ = 3.5. We wish to compute the required sample size using a confidence interval based design with a type-1 error of 5% and precision estimate of ω = 2. In East under the Design ribbon for Continuous data, click Many Samples and then click Factorial Design: One Way ANOVA as shown: 1532 63.9 Many Sample Test - One Way ANOVA <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box and enter the following design parameters: Number of Groups(R): 4 Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): Computed (select radio button) One-sided Width(ω): 2 Common Standard Deviation (σ): 3.5 Group 1: Mean= 0 Group 2: Mean= 1.5 Group 3: Mean= 2.5 Group 4: Mean= 2 63.9 Many Sample Test - One Way ANOVA 1533 <<< Contents 63 * Index >>> Confidence Interval Based Design Leave all other Group values (Contrast Coefficients and Allocation Ratios) as defaults and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 1534 63.9 Many Sample Test - One Way ANOVA <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a One-sided confidence interval and with various values for ω and σ, as well as any desired differences in group means, contrast coefficients or group allocation ratios. Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 300. In the Design Window the parameters now become: Number of Groups(R): 4 Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): 300 Half Width(ω): Computed (select radio button) Common Standard Deviation (σ): 3.5 Group 1: Mean= 0 Group 2: Mean= 1.5 Group 3: Mean= 2.5 Group 4: Mean= 2 63.9 Many Sample Test - One Way ANOVA 1535 <<< Contents 63 * Index >>> Confidence Interval Based Design Enter the above in the Design Input screen and click Compute: The precision parameter ω is calculated to be 2.505. As the sample size is decreased, the resulting value of ω increases. In other words, the precision limit increases, resulting in a wider confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot, found in the plots menu. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. menu, also found in the Width values can be generated using the Tables Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.10 1536 Many Sample Test - One Way Repeated Measures The One Way Repeated ANOVA tests for equality of means in a repeated measures setting. As the patient population is exposed to each treatment, the measurement of the dependent variable is repeated, resulting in correlation between observations from the same patient. Constant correlation assumes that the correlation between observations from the same patient is constant for all patients. This correlation parameter (ρ) needs 63.10 Many Sample Test - One Way Repeated Measures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 to be specified in the one way repeated measures study design. Consider a hypothetical longitudinal study that investigates the effect of a dietary intervention on weight loss, where the endpoint is decrease in weight (in kilograms) from baseline. Data is collected at four time points: baseline, 4 weeks, 8 weeks, and 12 weeks and are measured to be 0 kg, 10.5 kg, 25kg, and 20kg respectively. Assume the common standard deviation within each group (i.e. at each level) is σ = 3.5 and the constant correlation (between level) ρ = 0.2. We wish to compute the required sample size for this study, using a two-sided confidence interval based design with a type-1 error of 5% and precision estimate (width) of ω = 2. In East under the Design ribbon for Continuous data, click Many Samples and then click Factorial Design: One Way Repeated Measures (Constant Correlation) ANOVA as shown: 63.10 Many Sample Test - One Way Repeated Measures 1537 <<< Contents 63 * Index >>> Confidence Interval Based Design This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box and enter the following design parameters: Number of Levels (M): 4 Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): Computed (select radio button) Half Width(ω): 2 Between Level Correlation (ρ): 0.2 Standard Deviation at each Level (σ): 3.5 Group 1: Mean= 0 Group 2: Mean= 10.5 Group 3: Mean= 25 Group 4: Mean= 20 1538 63.10 Many Sample Test - One Way Repeated Measures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Leave all other Group level values (Contrast coefficients) as defaults and click Compute. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 63.10 Many Sample Test - One Way Repeated Measures 1539 <<< Contents 63 * Index >>> Confidence Interval Based Design Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a One-sided confidence interval and with various values for ω, ρ and σ, as well as any desired differences in group information. Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example increase n from 95 to n = 200. In the Design Window the parameters now become: Number of Levels(M): 4 Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): 200 Half Width(ω): Computed (radio button selected) Between Level Correlation (ρ): 0.2 Standard Deviation at each Level (σ): 3.5 Group 1: Mean= 0 Group 2: Mean= 10.5 Group 3: Mean= 25 Group 4: Mean= 20 1540 63.10 Many Sample Test - One Way Repeated Measures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Enter the above in the Design Input screen and click Compute: The precision parameter ω is calculated to be 1.372. As the sample size is increased, the resulting value of ω decreases. In other words, the precision limit decreases, resulting in a tighter confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot, found in the plots menu. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.11 Normal Test for Linear Regression - Single Slope Regression models are often used to examine the relationship between a response and one or more explanatory variables. A simple linear regression model tests a single slope for one continuous covariate when the relationship with response is linear. The assumption is that the observed value of a response variable Y is a linear function of the explanatory variable X, plus some random noise. 63.11 Normal Test for Linear Regression - Single Slope 1541 <<< Contents 63 * Index >>> Confidence Interval Based Design For i = 1, . . . , n subjects in a study the model can be written as: Yi = γ + θ Xi + i where each i is an independent normal random variable with E(i ) = 0 and V ar(i ) = σ2 . Xi (subject i) is a random variable with a variance σx2 . More information on simple linear regression models, including distinctions between different types of studies and details on the calculation of the test statistic can be found in Section 19.1. A dose-response relationship describes the effect of an exposure on an outcome (positive or negative) and is a crucial consideration in the development of a drug or other treatment. The relationship is often determined by estimating the slope of a regression model such as the one above, where Y is the appropriate response variable and the explanatory variable X is a set of specified doses. Consider a hypothetical clinical trial involving different doses of a medication under study. Assume that the doses and randomization of subjects across the doses have been chosen so that the standard deviation σx = 9. Based on information gained from prior studies, it can be assumed that σ = 15. When the slope of the linear regression model is 0, the relationship between the outcome and covariate is flat. In other words, there is no evidence of a dose-response relationship. It therefore of interest to test the null hypothesis H0 : θ = 0 against a two-sided alternative H1 : θ 6= 0. Consider a confidence interval based design for the above study to determine if a dose-response relationship exists between the patient outcome and dose level of a drug. Use a two-sided test with a type-1 error rate of 5% to compute the sample size required using a precision parameter (width) of ω =0.15. To illustrate this example, in East under the Design ribbon for Continuous data, click Regression and then click Single Arm Design: Linear Regression - Single Slope as 1542 63.11 Normal Test for Linear Regression - Single Slope <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 shown: This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box, enter the following design parameters and click Compute: Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): Computed (select radio button) Half Width (ω): 0.15 Standard Deviation of X(σX ): 9 Standard Deviation of Residuals X(σ ): 15 63.11 Normal Test for Linear Regression - Single Slope 1543 <<< Contents 63 * Index >>> Confidence Interval Based Design The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 1544 63.11 Normal Test for Linear Regression - Single Slope <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview window and clicking the icon. This test can easily be repeated for a One-sided confidence interval and with various values for ω, σX , and σ . Alternatively East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 200. In the Design Window the parameters now become: Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): 200 Half Width (ω): Computed (select radio button) Standard Deviation of X(σX ): 9 Standard Deviation of Residuals X(σ ): 15 63.11 Normal Test for Linear Regression - Single Slope 1545 <<< Contents 63 * Index >>> Confidence Interval Based Design Enter the following in the Design Input screen and click Compute: The precision parameter ω is calculated to be 0.231. When the sample size is decreased the estimate of the precision limit increases leading to a wider confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot, found in the plots menu. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.12 1546 Normal Test for Linear Regression - Difference of Slopes Linear regression models are used to examine the relationship between a response variable and one or more explanatory variables assuming that the relationship is linear. One type of linear regression tests the equality of two slopes in a model with only one observation per subject. In such experimental situations, it is of interest to compare the slopes of two regression lines. 63.12 Normal Test for Linear Regression - Difference of Slopes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The regression model relates the response variable Y to the explanatory variable X using the model Yil = γ + θi Xil + il , where the error il has a normal distribution with mean zero and an unknown variance σ2 for Subject l in Treatment i, i = c, t and 2 2 l = 1, . . . , ni . Let σxc and σxt denote the variance of the explanatory variable X for control (c) and treatment (t), respectively. More information on linear regression models for comparing two slopes and details on the calculation of the test statistic can be found in Section 19.2. Suppose a treatment response depends on the level of a certain laboratory parameter. A new formulation is to be developed to decrease this interaction between the response and the level. The explanatory variable is the baseline value of the laboratory parameter. The study is designed with σxc = σxt = 6 and σ = 10. It is of interest to test the equality of the slopes θc and θt under the null hypothesis H0 : θt = θc against the two-sided alternative H1 : θt 6= θc . Consider a confidence interval based design for the above study to determine if there exists a difference between the slopes of the two regression lines. Use a two-sided test with a type-1 error rate of 5% to compute the sample size required using a precision parameter (width) of ω =0.5. To illustrate this example, in East under the Design ribbon for Continuous data, click Regression and then click Parallel Design: Linear Regression - Difference of Slopes as shown: 63.12 Normal Test for Linear Regression - Difference of Slopes 1547 <<< Contents 63 * Index >>> Confidence Interval Based Design This will launch the following input window: Choose Confidence Interval in the Design Type dropdown box, enter the following design parameters and click Compute: Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): Computed (select radio button) Allocation Ratio (nt /nc ): 1 Half Width (ω): 0.5 Standard Deviation of X(σx ): 6 Standard Deviation of Residual X(σ ): 10 The Allocation Ratio (nt : nc ) describes the ratio of patients to each arm. For example, an allocation ratio of 3:1 indicates that 75% of the patients are randomized to the treatment arm as opposed to 25% to the control. Keep the default allocation 1548 63.12 Normal Test for Linear Regression - Difference of Slopes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ratio(nt /nc )= 1. The sample size for this design is calculated and the output is shown as a row in the Output Preview window: As is standard in East, this design has the default name Des 1. To see a summary of the output of this design, click anywhere in the row and then click the icon in the Output Preview toolbar. The design details will be displayed, labeled Output 63.12 Normal Test for Linear Regression - Difference of Slopes 1549 <<< Contents 63 * Index >>> Confidence Interval Based Design Summary. This design can be saved to the Library by selecting the Des 1 in the Output Preview icon. This test can easily be repeated for a One-sided window and clicking the confidence interval and with various values for ω, σx , and σ . Alternatively, East can compute the precision level ω given a fixed sample size using a confidence interval based design. Following the example above, to determine the precision of the estimate of population parameter where the sample size is fixed, the user has to only enter the desired value, for example n = 200. In the Design Window the parameters now become: Test type: 2 sided Confidence level (1 − α): 0.95 Sample size (n): 200) Allocation Ratio (nt /nc ): 1 Half Width (ω): Computed (select radio button) Standard Deviation of X(σx ): 6 Standard Deviation of Residual X(σ ): 10 1550 63.12 Normal Test for Linear Regression - Difference of Slopes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Enter the following in the Design Input screen and click Compute: The precision parameter ω is calculated to be 0.462. As the sample size increases the precision limit decreases, providing a tighter confidence interval for the parameter of interest. The output for all parameters is again shown as a row in the Output Preview window and the design can be saved to the Library using the standard method as with all tests in East. From there, a summary of the design can be generated using the details icon. East also provides a very useful Sample Size vs. Width plot, found in the plots menu. This dynamic visual can immediately assess how changing the sample size effects the resulting width of the confidence interval. A table of Sample Size vs. Width values can be generated using the Tables menu, also found in the Library. This feature allows the user to input a range of values to generate multiple confidence intervals and the corresponding sample sizes. 63.12 Normal Test for Linear Regression - Difference of Slopes 1551 <<< Contents * Index >>> 64 Simulation in East East lets you simulate studies that were created by its design module. This chapter describes the simulations that are available in East. Through these simulation capabilities, you can repeatedly generate the entire path traced out by a test statistic under user-specified assumptions about treatment effects. Thereby you can verify various operating characteristics of your designs. 64.1 Normal Studies To begin let us design a study. Click Continuous: Two Samples on the Design tab and then click Parallel Design: Difference of Means. Enter the design parameters as shown below. Use the default boundary information and click Compute to create the design. The output summary is shown below. The study is designed for up to 5 looks with the LD(OF ) spending function, and a 1552 64.1 Normal Studies <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 two-sided α of 0.05. At most 25 patients are needed in order to achieve 90% power with this large standardized treatment effect of 60/45 = 1.333. Save the design to the workbook and click the worksheet. icon. You will be taken to the following simulation Notice the Test Statistic option on the right. For normally distributed data with known variance one would select the z-test option from the drop down menu. These simulations are accurate regardless of sample size. However, for normally distributed responses with unknown variance, selecting the z-test option will result in simulations that may not be valid for designs with small sample sizes. This is because with small samples, the Wald test statistic at any monitoring time-point has a student-t distribution rather than a standard normal distribution. The stopping boundaries in East exhibit type-1 error and power exactly as specified only if the sequentially computed test statistic is normally distributed and has independent increments. To the extent that the test statistic relies on large sample theory for its distributional behavior, there may be some loss of accuracy in the operating characteristics of the stopping boundaries. For sample sizes exceeding 100, the loss of accuracy is scarcely noticeable. However when the sample size is of the order of 20, there is indeed a noticeable loss of accuracy and the study must be re-designed and simulated repeatedly until, by trial and error, it possesses the required type-1 error and power. Let us now illustrate this with an example. At the interim monitoring stage we will be 64.1 Normal Studies 1553 <<< Contents 64 * Index >>> Simulation in East tracking the Wald statistic. Thus we should simulate the behavior of this statistic ahead of time and verify that the type-1 error and power of the study are indeed as specified. Suppose we have accrued a total of nj subjects by the jth look. We have two choices for computing the test statistic and checking if it has crossed a stopping boundary. 1. Use the value of σ = 45 specified at the design stage and compute Zj = X̄tj − X̄cj q 2 . 4σ nj (64.1) This statistic is normally distributed with variance 1 and a known correlation structure across different values of j . Consequently, it should produce the precise type 1 error and power specified in the study design even though the maximum sample size is only 24. To do this select z-test from the drop down menu next to Test Statistic. Next, click on the Response Generation Info tab and enter the parameters as shown below. Next, click the Simulate button. The simulation intermediate window will appear as shown below: In the actual trial we would have to know the value of σ 2 in order to compute Zj . If the estimate σ = 45 is incorrect the power and type-1 error of the trial will not match the simulation results. Even worse, we have no way of knowing if the simulation results are correct or not, since it is difficult to verify the value of σ from a small data set. Thus it might be preferable to use an estimate of σ in the definition of the test statistic. This is discussed next. 2. Estimate σ 2 by s2j from the interim data and compute Tj = 1554 64.1 Normal Studies X̄tj − X̄cj r . 4s2j nj (64.2) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 It is more common to monitor a group sequential, normal endpoints trial with the test statistic Tj given by equation (64.2) than with the test statistic Zj given (64.1). If we use (64.1) in the interim monitoring phase of a trial, we imply that we know the value of σ 2 with certainty, since it is needed in the computation. But the value of σ 2 that we use for this purpose may only be an informed guess with no data to back it up. At the interim monitoring stage, we have the opportunity to actually estimate σ 2 from the data and use the estimate, s2j say, in the computation of the test statistic (64.2). This might be a more reliable approach than making a strong assumption that σ 2 is known with certainty. Now the distribution of Tj is only asymptotically normal. In small samples Tj has a student-t distribution under the null hypothesis. Thus use of Tj does not by itself ensure that the study will have the power and type-1 error that were implied by the sample size and stopping boundaries specified in the study design. This is where the simulations can help. Since the test statistic Tj is computed entirely from the data, and contains no unknown nuisance parameters, we can obtain the true power and type-1 error of any design that uses Tj for the interim monitoring, by means of simulation. To do this open the simulation worksheet and select the following options: Next, click on the Response Generation Info tab. You will notice that we do not have the option to select how the data are generated. We must use Individual Means. This is because when East calculates the t-statistic it needs to estimate the variance in each group. This is not possible if East generates the data using the Difference of Means option since East is only simulating differences of the means and not the actual means themselves. Thus, East cannot estimate the variance in each group. It is for this reason that the Individual 64.1 Normal Studies 1555 <<< Contents 64 * Index >>> Simulation in East Means option is selected. Again we wish to simulated under the null hypothesis, µt − µc = 0. Enter the parameters in this tab as shown below. Next, click the Simulate button. The simulation results will appear in the output preview window. Save this simulation to the workbook and then double click on Sim2 in the Library. A portion of the results are shown below. We observe that this small study does not preserve the type-1 error. 64.2 Binomial Studies When computing a design for binomially distributed responses, East relies on the normal approximation to the binomial distribution. Thus, these designs may not be as accurate for small sample sizes. As in the previous section, the study should be re-designed and simulated repeatedly until, by trial and error, it possesses the required type-1 error and power. The simulations in East, as opposed to the designs, generate data from the actual binomial model specified instead of relying on a normal approximation. Thus, the simulations might provide a more realistic assessment of power and type-1 error for designs involving binomial endpoints and small sample sizes. To illustrate, consider the following binomial design. Click Discrete: Two Samples on the Design tab and then click Parallel Design: Difference of Proportions and enter 1556 64.2 Binomial Studies <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the design parameters as shown below. Next click on the Boundary Info tab and enter the parameters as shown below. 64.2 Binomial Studies 1557 <<< Contents 64 * Index >>> Simulation in East Click Compute to create the design. The output summary is shown below. The study is designed for up to 5 looks and a two-sided α of 0.05. At most 36 patients are needed in order to achieve 90% power. Save the design to the workbook and click the 1558 icon. You will be taken to the following simulation worksheet. 64.2 Binomial Studies <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Next, click Simulate. A portion of the results is shown below. Notice that the simulated power this design is (up to Monte Carlo accuracy) and slightly lower than 90%. It is worth noting that by the time the sample size exceeds 100, the normal approximation should be sufficiently accurate. To see this, create a new design Des2 by editing Des1 and changing πt value to 0.35. The Des2 summary will be as follows Save this design to the workbook and open the simulation worksheet by clicking on the icon in the Library. Again, click Simulate. A portion of the results is shown 64.2 Binomial Studies 1559 <<< Contents 64 * Index >>> Simulation in East below. This confirms that the power is indeed preserved (up to Monte Carlo accuracy) for group sequential designs based on the normal approximation to the binomial with large sample sizes. In general, whenever a small binomial study is contemplated it is a good idea to verify its operating characteristics through simulations. 64.3 1560 Description of Simulation Output Columns Following are the output quantities computed while simulating ‘Subject Data’. 64.3 Description of Simulation Output Columns <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Column Name Scenario ID Simulation ID Subject ID Arrival Time Treatment ID Survival Time DPN-2S-RAOut Time Stratum Var < i > CensorInd Response Endpoint < i > Survival Time Weeks Site ID Description Identification number of scenarios when multiple values are provided for a parameter(s) Identification number of simulations Identification number of subjects Arrival Time of a particular subject Treatment given to a particular subject Survival Time of a particular subject Time when a particular subject dPN-2S-RAs out from the study Variable which stratifies the subjects into different levels Indicator variable (flag) denoting whether a particular subject is censored or not Response corresponding to a particular subject after given a particular treatment Response of endpoint < i > Survival Time of a particular subject in time unit weeks Identification number of sites 64.3 Description of Simulation Output Columns Applicability All Simulations All Simulations All Simulations All Simulations All Simulations SU-2S-LRAR, SU-2S-LRSD All Simulations SU-2S-LRAR, SU-2S-LRSD Enhanced Simulations MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD All Simulations Survival Designs MN-2S-DI,PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD 1561 <<< Contents 64 * Index >>> Simulation in East Following are the output quantities computed while simulating ‘Summary Statistics’. Column Name Scenario ID SimIndex Look Index Status BdryStopCode Accruals 0 DPN-2S-RAOuts0 Pendings0 Events0 Accruals < i > DPN-2S-RAOuts < i > 1562 Description Identification number of scenarios when multiple values are provided for a parameter(s) An identifier for the simulation Identifier for the look number Variable denoting whether a simulation was successfully executed Lookwise stopping decision Total accrued subjects under control for a particular simulation Total dPN-2S-RApedout subjects under control for a particular simulation Number of pending subjects under control for a particular simulation Total number of events happened for control Total accrued subjects under treatment i for a particular simulation Total dPN-2S-RApedout subjects under treatment i for a particular simulation 64.3 Description of Simulation Output Columns Applicability All simulations All simulations Multi Look Simulations All simulations Multi Look simulations (0=Continue, 1= lower efficacy stop, 2= Upper Efficacy stop, 3= Futility) Enhanced simulations with accrual dPN-2S-RAout Enhanced simulations with accrual dPN-2S-RAout Enhanced simulations with accrual dPN-2S-RAout SU-2S-LRAR, SU-2S-LRSD Enhanced simulations with accrual dPN-2S-RAout Enhanced simulations with accrual dPN-2S-RAout <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Column Name Pendings < i > Description Number of pending subjects under treatment i for a particular simulation Events < i > Total number of events happened for treatment i Total accruals for all the treatments together in a particular simulation TotAccruals TotDPN-2S-RAOuts Tot Pendings Tot Events Look Time Total dPN-2S-RA outs for all the treatments together in a particular simulation Total pending subjects for all treatments together Total events for all the treatments together in a particular simulation At what time a particular look was taken Avg FollowUp Time Average followup time for a particular simulation LogRankScore Numerator of log rank statistic HRfMN-2S-RALRStat HR estimated fMN-2S-RA Log Rank statistic 64.3 Description of Simulation Output Columns Applicability Enhanced simulations with accrual dPN-2S-RAout SU-2S-LRAR, SU-2S-LRSD Enhanced simulations with accrual dPN-2S-RAout Enhanced simulations with accrual dPN-2S-RAout Enhanced simulations with accrual dPN-2S-RAout SU-2S-LRAR, SU-2S-LRSD Enhanced simulations with accrual dPN-2S-RAout Enhanced simulations with accrual dPN-2S-RAout SU-2S-LRAR, SU-2S-LRSD SU-2S-LRAR, SU-2S-LRSD 1563 <<< Contents 64 * Index >>> Simulation in East Column Name StdError LRStat Description Standard Error Log rank statistic LwrEffBdry UprEffBdry LwrFutBdry UprFutBdry AccrDurtn Lower efficacy boundary Upper efficacy boundary Lower futility boundary Upper futility boundary Accrual duration HazardRate0Strat < i − 1 > Hazard rate for control in stratum < i > Hazard rate for treatment < j > in stratum < i > Hazard ratio in stratum < i > Numerator of log rank statistic corresponding to stratum < i > HazardRate < j > Strat < i − 1 > Hazard Ratio Strat < i − 1 > Log Rank Score Strat < i − 1 > Std Error Strat < i − 1 > Completers0 Completers < i > 1564 Standard error of log rank score corresponding to stratum < i > Number of completers under control Number of completers under treatment < i > 64.3 Description of Simulation Output Columns Applicability All simulations SU-2S-LRAR, SU-2S-LRSD All 2 sided simulations All 2 sided simulations All 2 sided simulations All 2 sided simulations Enhanced simulations with accrual dropout SU-2S-LRAR, SU-2S-LRSD SU-2S-LRAR, SU-2S-LRSD SU-2S-LRAR, SU-2S-LRSD SU-2S-LRAR, SU-2S-LRSD SU-2S-LRAR, SU-2S-LSRD SU-2S-LRAR, SU-2S-LRSD All simulations All simulations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Column Name Tot Completers Sum 0 PPN-2S-RA0 Sum < i > PPN-2S-RA < i > PPN-2S-RAPld Rho HFactor Info Adaptation Description Total number of completers across all treatments Sum of responses for control PPN-2S-RAortion of responses for control Sum of responses for treatment < i > PPN-2S-RAortion of responses for treatment < i > Pooled pPN-2S-RAortion pooled across treatments Ratio of pPN-2S-RAortions estimated fMN-2S-RA the data Standard error of ratio between pooled and unpooled standard errors Fisher’s information corresponding to a particular look Whether adaptation happened for a particular simulation Zone Sample size reestimation zones Delta Sign Sign of delta Adapt ReEstCompleters Reestimated number of completers after adaptation 64.3 Description of Simulation Output Columns Applicability All simulations All continuous endpoint simulations All discrete simulations All continuous endpoint simulations All discrete simulations All discrete simulations All discrete simulations All simulations All simulations with SSR option available All simulations with SSR options available All simulations with SSR options available All simulations with SSR options available 1565 <<< Contents 64 * Index >>> Simulation in East Column Name WaldStatIncr TestStat InterimCP Conditional power at the adapt look before adaptation AttainedCP Conditional power attained at the adapt look after adaptation using reestimated events Indicates whether accrual duration is less than adapt look time Mean of control responses AccrDurnLTAdptLkTime Mean0 SumOfSquares0 StdDev0 Mean < i > SumOfSquares < i > StdDev < i > StdDevPld Delta 1566 Description Incremental wald test statistic Test statistic Sum of square of control responses Standard deviation of control responses Mean of treatment < i > responses Sum of square of treatment < i > responses Standard deviation of treatment < i > responses Pooled standard deviation pooled across treatments Difference of treatment and control mean response 64.3 Description of Simulation Output Columns Applicability All simulations with SSR option available All simulations with SSR option available All simulations with SSR option available All simulations with SSR options available CHW/ CDL simulations All Continuous endpoint simulations All Continuous endpoint simulations All Continuous endpoint simulations All Continuous endpoint simulations All Continuous endpoint simulations All Continuous endpoint simulations All Continuous endpoint simulations All Continuous endpoint simulations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Column Name Tstat Description Calculated t statistic Zstat Calculated Z statistic CPnull AdaptReEstCompleters Conditional type I error Reestimated completers after adaptation adjusted for upper and lower limits Actual unadjusted reestimated completers after adaptation Actual look postion in the post adapt looks Estimated delta at second stage Standard error of delta at second stage t statistic at second stage Lower efficacy boundary for second stage Upper efficacy boundary for second stage Lower futility boundary for second stage Upper futility boundary for second stage Repeated confidence interval lower bound Repeated confidence interval upper bound AdaptActReEstCompleters MSActReEstCompleters EstDeltaII SEII TStatII LwrEffBdryII UprEffBdryII LwrFutBdryII UprFutBdryII RCILowerBound RCIUpperBound 64.3 Description of Simulation Output Columns Applicability MN-1S-SM, MN-2S-DI, MN-2S-RA, MN-MAMS-PC, PN-MAMS-PC MN-1S-SM, MN-2S-DI, MN-2S-RA, MN-MAMS-PC, PN-MAMS-PC MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations 1567 <<< Contents 64 * Index >>> Simulation in East Column Name BWCILowerBound BWCIUpperBound BWCIMUE LwrStgIDsgnIndx LwrStgITestStat UprStgIDsgnIndx UprStgITestStat RawPValue < i > RejectionFlag < i > StopStatus Return Code 1568 Description Backward image confidence interval lower bound Backward image confidence interval upper bound BWCI median unbiased estimate Stage I design index at which the stage II design power is less than the stage I conditional power for lower BWCI estimates Test statistic value which gives the desired stage II design power for lower BWCI estimates Stage I design index at which the stage II design power is less than the stage I conditional power for upper BWCI estimates Test statistic value which gives the desired stage II design power for upper BWCI estimates raw p value corresponding to treatment < i > Flag indicating whether null hypothesis corresponding to treatment < i > is rejected Status of a treatment after a look Indicator variable denoting whether a simulation ran successfully 64.3 Description of Simulation Output Columns Applicability MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MS simulations MN-MAMS-PC, PN-MAMS-PC MN-MAMS-PC, PN-MAMS-PC MN-MAMS-PC, PN-MAMS-PC MN-2S-ME, PN-2S-ME <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Column Name PPN-2S-RA 0 Endpoint < i > PPN-2S-RA < j > Endpoint < i > Delta < j > Endpoint1 StdError < j > Endpoint < i > Test Stat < j > Endpoint < i > Pval < j > Endpoint1 Maxpval Fam < i > Adjpval Endpoint < i > Adjpval SampleSize 0 SampleSize < i > Tot SampleSize Description Observed response rate corresponding to end point < i > for control Observed response rate corresponding to end Point < i > for treatment < j > Difference of treatment < j > and control response corresponding to endpoint < i > Standard error of delta < j > Test statistic < j > corresponding to endpoint < i > pvalue < j > correspoing to endpoint < i > Maximum pvalue among family < i > of endpoints Adjusted pvalue corresponding to endpoint < i > Adjusted pvalue for last family Sample size corresponding to control Sample size corresponding to treatment < i > Total sample Size 64.3 Description of Simulation Output Columns Applicability PN-2S-ME PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME , PN-2S-ME MN-2S-ME , PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME 1569 <<< Contents 64 * Index >>> Simulation in East Column Name RejFlag 1 Endpoint < i > IsNonNull 1 Endpoint < i > FWERFlag Fam < i > FWERFlag ConPowFlag Fam < i > DisjnPowFlag Fam < i > DisjnPowFlag ConPowFlag Stage 1570 Description Rejection flag indicating whether the hypothesis corresponding to endpoint < i > is rejected Indicator variable denoting whether endpoint < i > is generated under null Indicator variable denoting whether a particular simulation contributes to FWER count for family < i > Indicator variable denoting whether a particular simulation contributes to overall FWER count Indicator variable denoting whether a particular simulation contributes to Conjunctive Power count for family < i > Indicator variable denoting whether a particular simulation contributes to Disjunctive Power count for family < i > Indicator variable denoting whether a particular simulation contributes to overall Disjunctive Power count Indicator variable denoting whether a particular simulation contributes to overall Conjunctive Power count Variable indicating whether we are in interim or final stage 64.3 Description of Simulation Output Columns Applicability MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME MN-2S-ME, PN-2S-ME Predict <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Column Name FABdryStopCode FAAccruals0 Description Stopping decision at final analysis (when no response is pending) Accruals for control at final analysis FACompleters0 Completers for control at final analysis FAAccruals1 Accruals for control at final analysis FAPendings1 Pendings for control at final analysis FACompleters1 Completers for control at final analysis FATotAccruals Total accruals for control at final analysis FATotPendings Total pendings for control at final analysis FATotCompleters Total completers for control at final analysis Look time at final analysis FALookTime FAAvgFollowUpTime FASum0 FAPPN-2S-RA0 FASum1 FAPPN-2S-RA1 Average followup time at final analysis Sum for control at final analysis PPN-2S-RA for control at final analysis Sum for treatment at final analysis PPN-2S-RA for treatment at final analysis 64.3 Description of Simulation Output Columns Applicability PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI 1571 <<< Contents 64 * Index >>> Simulation in East Column Name FAPPN-2S-RAPld FADelta Description Polled pPN-2S-RAortionat final analysis Delta at final analysis FAHFactor HFactor at final analysis FAStdError Standard error at final analysis Information at final analysis Wald test statistic at final analysis T test statistic at final analysis Z test statistic at final analysis FAInfo FAWaldTestStat FATStat FAZStat 1572 64.3 Description of Simulation Output Columns Applicability PN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI, MN-2S-DI PN-2S-DI MN-2S-DI MN-2S-DI <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Following are the output quantities computed while simulating ‘Sitewise Summary Statistics’. Column Name SiteID Description Identification number of sites AvgInitiationTime Identification number of sites AvgLastSubjArrTime Sitewise average last subject arrival time averaged over simulations Sitewise avergae number of subjects averaged over simulations Sitewise avergae accrual duration averaged over simulations Sitewise avergae rate of accrual averaged over simulations In how many simulations a particular site is opened AvgNumOfSubj AvgAccrualDuration AvgAccrualRate SiteOpenedSimCount 64.3 Description of Simulation Output Columns Applicability MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD 1573 <<< Contents 64 * Index >>> Simulation in East Following are the output quantities computed while simulating ‘Site Parameters’. Column Name SimulationID Description Identification number of simulations SiteOpenFlag Flag indicating whether a particular site is opened in a particular simulation Flag indicating whether a particular site is already opened at the time of prediction in a particular simulation Identification number of sites SiteAlreadyOpened SiteID SiteInitiationTime SiteAccrRate SubjectsAccrued LastSubjectRand AccrualDuration ObsrvdAccrualRate 1574 Time when a particular site is initiated in a particular simulation Accrual Rate corresponding to each site in a particular simulation How many subjects are accured at a particular site in a particular simulation Time when the last subject was randomized for a particualr site Duration of accrual corresponding to a particular site in a particular simulation Observed accrual rate corresponding to a site in a particular simulation 64.3 Description of Simulation Output Columns Applicability MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA, SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD MN-2S-DI, PN-2S-DI, PN-2S-RA,SU-2S-LRAR, SU-2S-LRSD <<< Contents * Index >>> 65 Predictive Interval Plots 65.1 Predicting the Future Course of a Trial with Predictive Interval Plots (PIPS) At the design stage of clinical trial, when no data are available, one relies on initial assumptions about the efficacy of the treatment arms to perform power calculations. Once the trial is underway, however, data begin to accrue and can be utilized to make predictions about the future course of the trial. These predictions fall into two categories; predictions from data pooled by treatment arm and predictions from unpooled data. For the trial sponsor, who must remain blinded to the results while the trial is on-going one, predictions from pooled data are the only option. A data monitoring committee on the other hand does have access to data broken out by treatment arm and is thus in a position to make predictions about the future course of the trial in an unblinded manner. In this chapter, we focus only on predictions from unblinded data. A popular way to make such predictions is through the use of conditional power. We have provided numerous examples of conditional power throughout this manual and hence will not dwell on it here. In this chapter we present an alternative graphical approach to prediction, utilizing predictive interval plots (PIPS) proposed by Evans, Li and Wei (2007) and Li, Evans, Uno and Wei (2009). These plots provide us with a visual display of the possible future outcomes for the trial by generating a series of repeated confidence intervals for future time points that are conditional on the current data. Conditional power is an automatic by-product of these plots, which provide additional insights about the magnitude of the treatment effect and its associated uncertainty. Please see Appendix L for details on input, output, and formulas relating to Predictive Interval Plots. 65.2 Example 1: PIP for Time to Event Data A clinical trial of non small cell lung cancer was designed for 80% power to detect a hazard ratio of 0.8 at α = 0.05 (two-sided) with three equally spaced looks using a Lan-DeMets O’Brien-Fleming type (LD(OF)) spending function. The primary endpoint was overall survival (OS). With these inputs, 641 OS events are needed to achieve 80% power. The median OS for the control arm was assumed to be 10 months. Based on 18 months of enrollment and an additional 12 months of follow-up this 30-month trial requires 639 events from a sample size of 897 patients. The workbook PIP-survival containing this design named lung is already available to you in the sub-folder Samples in the East 6.4 installation folder in your computer. A typical path for this sub-folder is: C:\Program Files (x86) \Cytel\Cytel Architect\East 6.4\Samples. Open this workbook from File or Home menu. 65.2 Example 1: PIP for Time to Event Data 1575 <<< Contents 65 * Index >>> Predictive Interval Plots The Library nodes will appear as shown below. Click on the design node lung and click on as shown below. icon to get the details of the design The First Interim Analysis Although the first interim analysis was planned after 213 events, due to rapid enrollment it occurred earlier, after only 119 events. The dataset containing the data from the first interim analysis is saved in a .csv file named PIP-Lung-Look01.csv in the Samples folder. To illustrate the role of the PIPs at this interim analysis, you need this dataset. While you are on the design node lung, you can bring up this dataset into the workbook by clicking on the menu item File > 1576 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Import or Home > Import and locating the sub-folder Samples. After clicking on the menu item Import and locating Samples sub-folder, you click on the dataset name PIP-Lung-Look01.csv. You will be presented with the following dialog box. Keep the default choices selected, click OK, and keep the imported dataset in the workbook PIP-survival. Now a new node with the name PIP-Lung-Look01.cydx will appear under the design node lung. The data also 65.2 Example 1: PIP for Time to Event Data 1577 <<< Contents 65 * Index >>> Predictive Interval Plots will be displayed in the right side window. The dataset is saved in the library as a Cytel file with extension .cydx. Examine this dataset. It contains five variables: TrtmntID (1=control, 2=experimental); SRVMON (time since entering the trial in months); ArrivalTime (time of entry into the trial), Censor1(1=alive; 0=dead, -1=lost to follow up); Censor2 (1=alive, 0=dead or lost to follow up). Note the presence of two censor variables. Censor1, indicating drop-outs with -1, is utilized by the program that generates the PIPs. Censor2, indicating either drop-outs or administratively censored patients, is utilized by the Analysis program computing the Logrank test. This can be seen in the choice of the variables in the Analysis dialog box and PIP dialog box detailed below. Before we can perform the first interim analysis, we must estimate the hazard ratio and its standard error from this interim analysis dataset. To that end, select the Two Samples>Logrank from the Analysis tab. 1578 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Enter the appropriate variables into the input dialog box. and click on the OK button at the bottom right side of the screen. You will get the analysis results with a summary as shown below. The hazard ratio is 0.919 and the total number of events is 119. These are the summary statistics we need to perform the first interim analysis. With the design node lung selected in the library, bring up the Interim Monitoring worksheet by clicking on the 65.2 Example 1: PIP for Time to Event Data 1579 <<< Contents 65 * Index >>> Predictive Interval Plots icon on the library toolbar. East gives you a facility to choose the columns to display in the IM sheet, by clicking on the show/hide icon and choosing from the list displayed. For convenient entry of the summary data into the interim monitoring worksheet, you can display the Interim Monitoring worksheet and the logrank analysis side by side in two windows, with the use of the menu item Home>Arrange>View Selected 1580 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Windows: Click on the button and the Test Statistic Calculator will appear. Here you have two options, either you can read and transfer directly from the results of analysis node or enter the estimate and SE of delta manually. Let us follow the first option. Click on the Recalc button and it will transfer the results from 65.2 Example 1: PIP for Time to Event Data 1581 <<< Contents 65 * Index >>> Predictive Interval Plots the analysis node to the test statistic calculator. If you had chosen the second option in the test statistic calculator, you would enter 119 for the cumulative number of events, ln(0.919) for δ̂ and 2/sqrt(119) for the standard error of δ̂. 1582 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click OK to enter the first-look data into the interim monitoring worksheet. The results at this first interim analysis are not very promising. The conditional power under the current trend (HR=0.919) is only 0.156 and the predictive power is only 0.403. The predictive interval plots (PIPs) can provide some additional insights by simulating the future course of the trial conditional on the data already obtained and assumptions about the hazard rates of the two survival curves. To generate these plots select the Look # 1 row. Then click on the icon to open the PIP dialog box. Enter the inputs into the left panel of the dialog box as shown below. The information in the PIP-Lung-Look01.cydx is now available to East. Entries into the right hand panel of the PIP inputs dialog box may either be user specified or estimated directly from the PIP-Lung-Look01.cydx dataset. To begin with, let us estimate the entries from the data. Accordingly click on the Optional: Estimate Parameters from Data button. The right hand panel fills up as 65.2 Example 1: PIP for Time to Event Data 1583 <<< Contents 65 * Index >>> Predictive Interval Plots shown below. We are now in a position to generate the predictive interval plots. As stated earlier these are repeated confidence intervals based on the data already observed and estimates of the hazard ratio for future looks. Since the first interim look was taken earlier than planned, there are still three additional interim looks (looks, 2, 3, or 4) to be encountered. The boundaries for these future looks have been re-computed based on the specified error spending function. To view the re-computed 4-look design, click on the 1584 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon at the top right of the input dialog box. Suppose we wish to generate 1000 PIPs for look 4, ignoring the intermediate looks. Select Final Look from the drop-down box . 65.2 Example 1: PIP for Time to Event Data 1585 <<< Contents 65 * Index >>> Predictive Interval Plots and press the Simulate button. The following plot is generated. One thousand repeated confidence intervals (RCIs) are generated for look 4, sorted in increasing order of their corresponding estimated hazard ratios, and stacked on top of each other. Save this PIP in the library by clicking on the Save in Workbook button on the bottom right of the plot. The library should now look as shown below. 1586 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let us examine the generated PIP. The black horizontal line is the RCI for the current look (look 1). Notice how much narrower the RCIs for look 4 are compared to the current RCI. By default, the vertical cursor is positioned at HR=1 on the X-axis. In this position it is seen that 19.1% of the RCI’s have upper bounds that are less than 1, suggesting that under the current trend with HR=0.919, the probability of a successful outcome for this trial at look 4 (ignoring all intermediate looks) is 0.191. One can drag the vertical cursor to the right or left to see what percentage of trials will successfully cut-off hazard ratios other than 1. For the present let us leave the vertical cursor at HR=1. Notice the thick vertical bar with colored bands near the Y-axis. This band displays quantiles of the distribution of the hazard ratios generated by the simulations. Each color on either side of the median contains 5% of the generated hazard ratios. Thus, for example, the lowest five bands on the bar, ending at HR=0.871 represent 25% of the generated hazard ratios. In other words, the lower 25-th quantile of the hazard ratios is 0.871. Since only 19.1% of the RCIs in this PIP resulted in a statistically significant outcome (upper bound of RCI less than 1), one might weigh the option of terminating the trial for futility. The above PIP was, however, generated under the assumption that the hazard ratio is 0.919, estimated from the look 1 data, is the actual hazard ratio. There is uncertainty associated with this estimate. Thus it would be desirable to take a conservative approach to futility termination and re-run the PIPs under the assumption, made at the design stage, that the underlying HR=0.8. To that end, we retrieve the input dialog box that was used for the current PIP by clicking on PIP1 in the library and clicking on the Edit tool in the library toolbar. While on the node PIP1, click on . In the ensuing dialog box, change the value of the hazard rate for the Treatment arm from the current λ(T reatment) value to 0.8 × λ(Control). Now generate a new PIP by clicking on the Simulate button and save it in the 65.2 Example 1: PIP for Time to Event Data 1587 <<< Contents 65 * Index >>> Predictive Interval Plots workbook. In this PIP, 73.2% of the RCIs have upper bounds that exclude HR=1. Therefore, given the uncertainty about the true value of the HR, it is premature to terminate this trial for futility and the trial continues to the next interim analysis. The Second Interim Analysis The dataset for the second interim analysis is contained in a .csv file named PIP-Lung-02.csv on your computer. Import this .csv into East as shown below. Next perform the logrank test on the look 2 data by invoking it from the Analysis tab in the same manner as you did for look 1. 1588 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The results will appear as shown below. At this look, taken after 258 events, the hazard ratio estimate is 1.019. We must enter this information into the interim monitoring worksheet. Select the node Interim Monitoring from the library and click on the will see the IM worksheet as shown below. icon in the library toolbar. You With Look #2 selected, click on the button and choose the option to read values from look 2 analysis node. Click on Recalc button to see the test 65.2 Example 1: PIP for Time to Event Data 1589 <<< Contents 65 * Index >>> Predictive Interval Plots calculator computations as shown below. enter in the resulting Test Statistic Calculator, the values for Cumulative Events = 258, Estimate of delta = ln(1.019), and Standard Error of Estimate of delta = 2/sqrt(258), and click the OK button. The interim monitoring worksheet gets updated. Now the conditional power under the current trend is only 0.014 and the predictive power is only 0.108. It is very unlikely that the trial will succeed and termination for futility appears to be a reasonable option. Before taking a final decision, however, it may be advisable to obtain a PIP for the future course of the trial under the assumption that HR=0.8 is still correct and the observed value HR=1.019 is due to variability in the data. Accordingly we invoke the PIP dialog box, enter a value of 0.8 for the hazard 1590 65.2 Example 1: PIP for Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ratio, and simulate the remainder of the trial 1000 times. We observe that 32.2% of the RCIs have upper bounds that are below 1. This suggests that if the trial continued and the true hazard ratio was indeed 0.8 the chance of a successful trial is 0.322. But how many of these successful outcomes would be considered clinically meaningful? Suppose that trials with observed values of HR that exceed 0.85 are not of any interest to the sponsor since there are other compounds on 65.2 Example 1: PIP for Time to Event Data 1591 <<< Contents 65 * Index >>> Predictive Interval Plots the market for this therapeutic area that have had smaller hazard ratios. Then the question becomes, how many of the 1000 RCIs would have upper bounds that are below 0.85. To answer this question, move the vertical cursor to 0.85 on the X-axis. This can be done either by dragging the cursor or (more conveniently) by entering the value 0.85 in the edit box at the top of the Read-offs panel of the PIP. It is seen that 0.2% of the RCIs have upper bounds that are below 0.85 even though we generated the PIP under the optimistic assumption that the true HR=0.8. It is clearly desirable to terminate the trial for futility. This example has shown that the RCIs provide more information than can be obtained from a conditional power calculation. The PIP may be used to determine whether a clinically meaningful treatment effect can be ruled out. 65.3 1592 Example 2: PIP for Binomial Data CAPTURE (Lancet, 1997; 349: 1429-35) was a randomized placebo-controlled trial of abciximab before and during coronary intervention in refractory unstable angina. After angiography, patients received a randomly assigned infusion of abciximab or placebo followed by percutaneous transluminal coronory intervention (PTCA). The primary endpoint was death from any cause within 30 days after the PTCA. The planned enrollment was 1400 patients with four equally spaced looks and stopping boundaries generated by the LD(OF) spending function. This study has 80% power to detect a 5% difference in mortality rates, from 15% on the placebo arm to 10% on the abciximab arm, at two sided α = 0.05. 65.3 Example 2: PIP for Binomial Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The workbook PIP-binomial containing this design named capture is already available to you in the sub-folder Samples in the East 6.4 installation folder in your computer Open the PIP-binomial workbook in the East library. The design details are shown below. The table below displays the results observed at each interim look. Table 65.1: Results Observed At Each Interim Look Look Sample Size Placebo Abciximab p-value Number 1 350 30/175 (17.2%) 14/175 (8%) 0.010 2 700 55/353 (15.6%) 37/347 (10.6%) 0.047 3 1050 84/532 (15.8%) 55/518 (10.6%) 0.010 The stopping boundary was crossed and the Data Monitoring Committee stopped the trial Let us enter the data from the first two looks into the interim monitoring worksheet. Select the CAPTURE design in the workbook library and click on the tool from the library toolbar. Now enter the data for the first two looks into the IM dashboard. For each look you 65.3 Example 2: PIP for Binomial Data 1593 <<< Contents 65 * Index >>> Predictive Interval Plots will have to click on the Enter Interim Data button to invoke the test statistic calculator and enter the data look by look as described below. For the first look, enter the data in the test statistic calculator, click on Recalc and OK buttons. Similarly post the data for the second look into the IM worksheet. 1594 65.3 Example 2: PIP for Binomial Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now you will see the computed results posted into the IM worksheet as shown below. It is evident from the above results that new drug looks promising. The conditional power is 0.858 and the predictive power is 0.778. It might be instructive at this stage to run a PIP for the future course of the trial. The data for the first two looks are stored on your computer in a .csv file named ”PIP-Capture-Look02.csv”. Import this file into 65.3 Example 2: PIP for Binomial Data 1595 <<< Contents 65 * Index >>> Predictive Interval Plots East. It will be added to the library with the name ”PIP-Capture-Look02.cydx”. Now return to the IM dashboard by selecting Interim Monitoring node in the library. To produce the PIP for the next look (look 3), select the Look 2 row on the IM dashboard and click on the button. Complete the Input dialog box as shown below. (Remember to click on the Optional:Estimate Parameters from Data button if you want East to compute the sample size and estimate the event rates from the look 2 dataset and post these parameters directly into the dialog box.) 1596 65.3 Example 2: PIP for Binomial Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on the Simulate button to generate the PIP for look 3 with 1000 repeated confidence intervals. Observe that for 49.8% of the RCIs have upper bounds that exclude 0. Thus, conditional on current data and the current estimates of the event rates, there is a 49.8% chance of crossing the early-stopping boundary at the very next look. Save this PIP in the library. This can be done by clicking on the Save in Workbook button at the bottom right of the screen. Suppose we wish to generate a PIP for any future look, not simply the next one. With the cursor on the PIP1 node in the library, click on the edit tool, and specify in the 65.3 Example 2: PIP for Binomial Data 1597 <<< Contents 65 * Index >>> Predictive Interval Plots resulting input dialog box that you wish to create a PIP for ”any future look”. Upon clicking the Simulate button the requested PIP is generated. 1598 65.3 Example 2: PIP for Binomial Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Overall, 86.6% of the RCIs have upper confidence bounds that are less than 0. The wider intervals are generated at Look 3 and the narrower ones are generated at Look 4. This PIP shows that the overall probability that this trial will be a success, conditional on current trends, is 0.866. The vertical rectangle with the colored bands displays the distribution of the estimated risk reductions, From this PIP we see that only the 5% of the estimated risk reductions will be less than 0.029. We now return to the IM dashboard and enter the data for look 3. Click OK on the test statistic calculator to post the computed values for third look. Now 65.3 Example 2: PIP for Binomial Data 1599 <<< Contents 65 * Index >>> Predictive Interval Plots the boundary is crossed and you are presented with the message on boundary crossing. You have to decide now, on the choice of either stopping or continuing the trial. In the actual trial the data monitoring committee (DMC) recommended that the trial be terminated and the sponsor agreed with the recommendation. Thus abciximab was declared to be superior to placebo in this class of patients with respect to all causes mortality, at the two-sided 5% level of significance. Notice, however, that the stopping boundary was barely crossed. 1600 65.3 Example 2: PIP for Binomial Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The efficacy boundary is -2.359 while the corresponding test statistic is -2.485. Had there been one less event on the Control arm and one more on the Treatment arm, the efficacy boundary would not have been crossed and the study would have continued to the final look after enrolling 1400 patients. Now the DMC is charged with examining the totality of evidence, including safety issues and consistency across secondary endpoints before recommending that a trial be terminated. Therefore sometimes, in close situations like this one, the DMC might well recommend that the trial not be terminated prematurely but rather that it continue to the end so as to achieve a robust result that can alter medical practice. In such a situation the DMC might find a PIP for the final look to be a valuable additional piece of information to help it with the decision making. In order to illustrate this, now stop the trial first. Next, click on PIP button while on the third look in the IM worksheet. You will be presented with PIP dialog box where you fill in the details. The dialog box will look as shown below. 65.3 Example 2: PIP for Binomial Data 1601 <<< Contents 65 * Index >>> Predictive Interval Plots Click on Simulate button. The PI plot will be generated as shown below. We see that 95.8% of the RCIs have upper confidence bounds that are below zero. Thus the trial were to continue to look 4, there is only a 4% chance that it would fail to achieve statistical significance. Moreover, the vertical bar near the Y-axis displaying the distribution of the estimates of treatment effect shows in 95% of the simulations the absolute risk reduction is at least -0.037. This is the type of robust result that the trial needs to obtain in order to alter medical practice. Thus the DMC might weigh the trade-off between terminating the trial immediately with a relatively marginal result or proceeding to take one more look with a high probability of achieving a stronger result. 65.4 1602 Example 3: PIP for Continuous Outcome Data We thank the AIDS Clinical Trials Group (ACTG) for permitting us to use this dataset. NARC 009 was a prospective, randomized, double-blind, placebo-controlled, multicenter, clinical trial of Prosaptide (PRO) conducted by the Neurologic AIDS Research Consortium for the treatment of HIV-associated neuropathic pain. Subjects were randomized to a daily dose of 2, 4, 8 or 16 mg PRO or placebo via subcutaneous injection. The primary endpoint was the 6-week reduction from baseline in the weekly average of random daily Gracely pain scale prompts, collected using an electronic diary. The trial randomized a total of 390 subjects in equal proportion to the five treatment arms. With 78 patients/arm the trial is capable of detecting a difference (treatment minus control) in the change from baseline of δ = −0.2 Gracely units with 65.4 Example 3: PIP for Continuous Outcome Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 93% power, for each dose versus placebo comparison, assuming a common σ = 0.35, two-sided α = 0.05, and no correction for multiplicity. One interim analysis was planned when about half the patients had enrolled. In this example, we will only consider the comparison of the 2 mg dose and placebo. The design is saved in the East workbook named PIP-normal. Please bring up this workbook into your East library. At the time of the interim analysis a total of 65 patients had enrolled to the two arms of the trial. The interim analysis data are stored in a .csv file named NARC 02mg.csv. Import this dataset into East. Then perform the Two Samples > Difference of Means test on the 65.4 Example 3: PIP for Continuous Outcome Data 1603 <<< Contents 65 * Index >>> Predictive Interval Plots imported dataset. Click on OK at the bottom of Analysis input dialog box. You will get the following analysis results. The observed value of δ is only -0.019 with a standard error of 0.087. We will enter these results into the interim monitoring worksheet. Select the NARC009 node in the tool in the library toolbar. Then click on the library and click on the button and enter the sample size, estimate of δ and standard error 1604 65.4 Example 3: PIP for Continuous Outcome Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 into the test statistic calculator and press OK. The computed values will now be posted in the IM worksheet. These interim results are rather poor. With a conditional power of only 0.015 and a predictive power of only 0.091 under the current trend, this trial is likely to fail. Before terminating the trial for futility, however, it would be useful to generate a PIP with 1000 RCIs generated under the design assumption that the true value of δ = −0.2. With the Look 1 row selected on the Interim Monitoring worksheet, click on the PIP button, click on ’Optional: Estimate Parameters from Data’ button, and complete the 65.4 Example 3: PIP for Continuous Outcome Data 1605 <<< Contents 65 * Index >>> Predictive Interval Plots dialog box as shown. Notice that the PIP is generated for final look is for look 3, not for look 2. Since look 1 was taken earlier than scheduled, after 65 subjects, the look that was actually designated as look 1 with 73 subjects, is becomes look 2. Thus the final look, with 146 subjects becomes look 3. Also the value of µt = µc − 0.2 = −0.431. 1606 65.4 Example 3: PIP for Continuous Outcome Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on the Simulate button to generate the PIP. Under the optimistic assumption that the true value of δ = −0.2 we see that 50.8% of the RCIs have upper bounds that are less than 0. This would suggest that the trial continue to the next look. It is important to point out, however, that the smallest value of δ that would be considered clinically meaningful is δ = −0.1. Accordingly, drag the cursor to -0.1 on the X-axis (or type -0.1, in the edit box at the top of the Read-offs panel. 65.4 Example 3: PIP for Continuous Outcome Data 1607 <<< Contents 65 * Index >>> Predictive Interval Plots It is now seen that only 1% of the RCI’s have upper bounds that are less than -0.1. Moreover, these RCI’s were generated under the optimistic assumption that the true δ = −0.2. We may thus feel confident that terminating the trial for futility is the correct decision. 1608 65.4 Example 3: PIP for Continuous Outcome Data <<< Contents * Index >>> 66 Enrollment/Events Prediction - At Design Stage (By Simulation) EastPredict is an enrollment/events prediction procedure that models the subject enrollment process. In general, the enrollment rate for a specific trial can be estimated based on past experience and any relevant information on that trial. However, this rate is only an estimate and the actual enrollment in a period needs to be treated as a random variable with a certain probability distribution. EastPredict module models this uncertainty in enrollment through the assumption that the subject arrival pattern follows a known probability distribution. In this chapter, we demonstrate the features of EastPredict (henceforth ‘East’) using examples of studies with normal, binomial, and survival endpoints. Important Note: In this chapter, we will use four examples for three endpoints normal (Orlistat trial), binomial (Capture trial), and survival (Rales trial and Oncox trial) to illustrate enrollment/events prediction procedures. The main purpose of these procedures is to predict at any time point of the study, the likely cumulative enrollment/completers/dropouts for normal and binomial studies and enrollment/events/dropouts for survival studies. A study may be terminated at a particular time point, because of a decision as per group sequential procedure. In that case, any prediction made for a subsequent time point will have no meaning. So the procedures described in this and the next chapter, predict what would materialize if the study reaches any particular time point, ignoring the possibility of earlier termination by crossing a group sequential boundary. In this way, the predictive procedures cover all possible scenarios, whether the study is likely to terminate earlier or later. 66.1 Normal Design 66.1.1 The Orlistat Trial: Initial Design 66.1.2 Simulating the Orlistat Trial 66.1.3 Output This section uses inputs from the Orlistat trial described in Chapter 10 and extends the example by adding site information and accrual information to the simulation design. 66.1.1 The Orlistat Trial: Initial Design The drug Orlistat was developed to treat obesity by promoting weight loss. Its efficacy was tested by randomizing patients into the treatment group or the control group according to the ratio 3:1, and comparing the resulting weight loss of the two groups after one year. The following assumptions were made: Expected mean weight loss in the treatment group: 9 kg Expected mean weight loss in the control group: 6 kg Standard deviation of weight change: 8 kg Eighteen sites participated in the trial. The accrual rate was expected to be 100 subjects per year with a dropout rate of 10% and a response lag of 1 year. 66.1 Normal Design – 66.1.1 The Orlistat Trial: Initial Design 1609 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) To design this trial navigate to the Design ribbon and select Two Samples under the Continuous tab and then Difference of Means, the first option under Parallel Designs. This will open a input dialog box, where you enter the following design parameters of the Orlistat trial in the corresponding fields: Design Type: Superiority Number of Looks: 3 Test Type: 1-Sided Type-1 Error: 0.05 Power: 0.9 Allocation Ratio (nt /nc ): 3 Mean Control (µc ): 6 Mean Treatment (µt ): 9 Std. Deviation (σ): 8 Click on the Include Options button in the top right-hand corner and select Accrual/Dropouts which opens a third tab of the same name. The design window then appears as follows: 1610 66.1 Normal Design – 66.1.1 The Orlistat Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the Boundary tab, we specify the details for the Efficacy boundary, and the spacing of the looks. We keep the spending function as the default Lan DeMets (OF) function. The spacing of the looks is defined in the column Info. Fraction which has a range of (0, 1.000]. When set to Equal the looks are distributed equidistantly across this range. Setting the spacing of looks to Unequal allows us to define at which points the looks occur by changing the corresponding information fractions. Let us assume that in the Orlistat trial all three looks are equally spaced and the dialog box will appear as shown below. The final step is to add the accrual/dropout information. Click on the Accrual/Dropouts tab and set Accrual Rate to 100, Response Lag to 1 and Probability of Dropout to 0.1. Note that East does not require the unit of time to be specified explicitly as long as consistency is maintained in the parameters given. In other words, we may choose the unit of analysis to be years, months or weeks as along as all time-related data (overall accrual rate, response lag, dropout rate, individual site accrual rates, etc.) is also expressed in terms of the same unit. Later examples will 66.1 Normal Design – 66.1.1 The Orlistat Trial: Initial Design 1611 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) demonstrate the use of months and weeks as units of analysis. We have entered all the parameters required for East to determine the sample size. Click Compute in the bottom right-hand corner of the design window. The following output preview is displayed in the lower panel when the computation is complete. East has determined that a sample size of 368 subjects is required to attain a power of 0.9. The trial is expected to be around 4.68 years long. In the next section we introduce the simulation feature to explore the enrollment process of this trial given information about the sites over which it will be conducted. Rename the design ‘ORLISTAT’ using the button in the Output Preview pane and then save it using . It will then appear in the Library pane on the left-hand side of the East interface in a workbook named ‘Wkbk1’, which you can rename as ‘Orlistat’. 66.1.2 1612 Simulating the Orlistat Trial 66.1 Normal Design – 66.1.2 Simulating the Orlistat Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The primary input in the simulation is the enrollment plan which contains the following information for each site: Site initiation period: the time period over which the site is expected to be initialized so that it is ready to begin enrolling subjects Site accrual rate: the number of subjects expected to arrive at the site over the unit of time chosen (in this case, ‘year’) Enrollment cap: the maximum number of subjects that may be enrolled at the site. This enrollment cap also applies to the entire study. This means that no single site or all the sites put together can enroll more than this enrollment cap. The table below shows a sample enrollment plan for Orlistat. Recall that all parameters in this example are in annual terms, thus a site initiation end time of 0.25 for Sites 2 to 18 indicates that these sites must be initiated within 3 months. In the case of Site 1, the start and end times of ‘0’ indicate that the site is ready to begin enrolling subjects immediately. In addition, note that the individual site accrual rates must sum up to the overall accrual rate specified during the design time, 66.1 Normal Design – 66.1.2 Simulating the Orlistat Trial 1613 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) such as in the plan above where all the site accrual rates sum to 100. Lastly, the enrollment cap for each site is generally set to the sample size. Let us simulate the enrollment process of the Orlistat trial under this enrollment plan. Select ‘ORLISTAT’ in the Library pane and click . This opens the simulation input dialog box containing four tabs: Test Parameters, Response Generation, Accrual/Dropouts, and Simulation Controls. Select the Test Statistic Z. The first three tabs contain the trial details we had entered in the initial design phase. In the Simulation Controls tab we can specify the number of simulation runs we wish to make as well as general output options. The enrollment plan is to be specified in the Accrual/Dropouts tab. Click on Include Options in the upper right-hand corner and select Site. The Accrual/Dropouts tab then provides an option to select the accrual model and a grid in which the enrollment plan can be filled in: Accrual Model East models the variation in accrual rate by assuming that subjects arrive according to one of two probability distributions: Uniform or Poisson. Under the uniform model, the arrival times of subjects are sampled from a uniform distribution over the given time interval. The Poisson model assumes that subjects arrive according to a Poisson process and thus their inter-arrival times are sampled from an exponential distribution. Experience suggests that arrivals follow a Poisson process and so for all examples in this chapter we select the Poisson accrual model. Enrollment Plan When entering the enrollment plan we must select whether we will specify it by region or by site. When we select Sites by Region it is assumed that all sites within a region have the same parameters (site initiation periods, accrual rates and enrollment caps), while selecting Sites allows us to specify enrollment parameters 1614 66.1 Normal Design – 66.1.2 Simulating the Orlistat Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 individually by site. The enrollment plan of Orlistat shown above was specified by site, thus we select Sites. The site parameters can be entered manually in the grid after specifying Number of Sites. Alternatively, you may create a spreadsheet such as the one shown below and save it as a comma-separated values (CSV) file and then import it. In the above data, ‘SiteID’ corresponds to the site name, ‘SIPstart’ refers to the Site Initiation Start and ‘SIPend’ to Site Initiation End. ‘Arate’ and ‘Ecap’ refer to the site accrual rate and enrollment cap respectively. For your convenience, this CSV file is already created and stored in Samples subfolder in your East installation folder, under the name EnrollmentPlan ORLISTAT yearly.csv. You may import this CSV file by clicking on Home−− >Import menu item and choosing the CSV file from Samples subfolder. This imported CSV file will appear as a node under ORLISTAT workbook with the extension .cydx which is the format for East data files. Click on Specify Enrollment Plan. . . button and select the workbook and the imported CSV file, now with the extension .cydx. Next, use the dropdown boxes in the Choose Variable panel to match the header names in your .cydx file to the column names shown in the East interface. In our example the final Import Enrollment Plan 66.1 Normal Design – 66.1.2 Simulating the Orlistat Trial 1615 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) window would appear as follows: When all these inputs are entered correctly the Accrual/Dropouts tab appears as shown below: As a final step, let us navigate to the Simulation Controls and set the number of simulations to 1000. Choose the Fixed seed as 12345. We can now simulate this design by clicking the Simulate button in the lower right-hand corner. East displays 1616 66.1 Normal Design – 66.1.2 Simulating the Orlistat Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the following window after it carries out the required simulation runs: Once the specified number of simulations has been run we can close the simulating design window and see a one-line summary of the output in the Output Preview pane with the ID Sim1: Then save Sim1 by clicking on the button. It will appear as a sub-node of the design in ‘Orlistat’ in the Library pane on the left-hand side of the East interface along with four spreadsheets containing detailed information from the simulation runs. Click on the Sim1 node and rename it ‘ORLISTAT’ using the the blue icon to denote designs and the brown icon 66.1 Normal Design – 66.1.3 Output . Note that East uses to denote simulations. 1617 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) 66.1.3 Output All outputs from the simulation can be accessed from the Library pane. Double-clicking on the ‘ORLISTAT’ simulation opens a general summary page containing four output tables. The first output table, Average Sample Size, Dropouts and Look Times, displays the average over all 1000 simulations of the sample size (the number of subjects enrolled in the study), completers (subjects who completed the one year period till follow-up), dropouts (subjects who dropped out of the study) and pipeline (subjects who enrolled but did not complete or drop out of the system formally). The table also contains the average look time for all three looks, for instance we observe that on average the first look took place at 2.336 years. The table Simulation Boundaries and Boundary Crossing Probabilities displays the efficacy boundary at each look and the number of simulations in which the boundary was crossed. In total, the trial was stopped for efficacy in 878 simulations resulting in the average power at termination of 0.878. The third table summarizes the enrollment plan, and the final table Overall Look-Wise Output shows the number of completers, accruals and dropouts over a 1618 66.1 Normal Design – 66.1.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 range of percentiles at each look. 66.1 Normal Design – 66.1.3 Output 1619 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) In addition, East generates a series of plots depicting the timelines of enrollment, completers and dropouts. These plots can be accessed by selecting the ‘ORLISTAT’ simulation in the Library pane and clicking the button. The Enrollment Prediction Plot displays the number of enrollments against time. It shows the predicted median and average enrollments across all simulations as well as the 95% confidence interval. For instance, at the time 2.651, indicated by the vertical marker the number of enrollments reached 253 in 97.5% of the simulations, while in 2.5% of the simulations the number of enrollments was below 224. Overall, East predicts a maximum accrual duration of around 4.2 years to enroll 368 subjects. 1620 66.1 Normal Design – 66.1.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Completers Prediction Plot displays the number of completers over time in terms of the 95% confidence interval, mean and median. In the case of normal and binomial designs the shape of the Completers Prediction Plot resembles that of the Enrollment Prediction Plot, with the main difference being that it is off-set to the right corresponding to the length of the response lag (one year, in the case of Orlistat). In addition, the prediction lines of the completers are slightly lower than the 66.1 Normal Design – 66.1.3 Output 1621 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) enrollments due to the number of dropouts. The Dropout Prediction Plot shows the fairly steady increase in dropouts as the trial progresses. The median number of dropouts by the end of the study is 36, as we would expect given the 10% dropout rate. Lastly, there are four output files nested below the ‘ORLISTAT’ simulation node containing the full details of all the simulation runs. These files, named SummaryStat, SubjectData, SiteSummary, and SiteData, are the source of the data displayed in the tables and plots described above. 1622 66.1 Normal Design – 66.1.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 66.2 Binomial Design 66.2.1 The CAPTURE Trial: Initial Design 66.2.2 Simulating the CAPTURE Trial 66.2.3 Output In the following section we simulate the CAPTURE trial introduced in Chapter 3. It is an example of a binomial design, where the aim was to compare two independent samples in terms of the difference of proportions in event rate. 66.2.1 The CAPTURE Trial: Initial Design The CAPTURE trial compared the performance of the drug Abciximab and a placebo on event rate. The null hypothesis H 0 stated that both the drug and the placebo had an event rate of 15%, versus the alternative hypothesis H 1 that Abciximab reduces the event rate from 15% to 10%.The study was 2-sided with a power of 0.8 and an α of 0.05. The accrual rate was 12 subjects/week, the probability of dropout was 5% and the response lag was 4 weeks. To design this trial, click on the Design ribbon and select ‘Two Samples’ under the Discrete tab and then click on ‘Difference of Proportions’: This opens an input dialog box: In the relevant fields of the dialog box, fill in the design parameters of the CAPTURE trial that are summarized below: Design Type: Superiority Number of Looks: 3 Test Type: 2-Sided Type-1 Error: 0.05 Power: 0.8 Prop. Under Control (π c ): 0.15 Prop. Under Treatment (π t ): 0.1 Allocation Ratio : 1 Next, click on the Include Options button, in the top right-hand corner and select Accrual/Dropouts. This opens an additional tab in which we can specify the accrual rate, response lag and the probability of subjects dropping out of the trial. When the design parameters are filled in correctly the Test Parameters window appears 66.2 Binomial Design – 66.2.1 The CAPTURE Trial: Initial Design 1623 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) as follows: In the Boundary tab we specify the details for the Efficacy boundary, the spacing of the looks, the boundary families and spending functions. We keep the default spending function of Lan DeMets (OF) for this design. The spacing of the looks is defined in the column Info. Fraction which has a range of (0, 1.000]. When the spacing of the looks is set to Equal the values of the information fraction are distributed equally across the range. If we wish specify when each interim look will be taken we can set the spacing of looks to Unequal and then enter the desired information fractions corresponding to the time points at which the interim looks shall occur. For this example let us assume all three looks are equally spaced. In the Accrual/Dropouts tab, set Accrual Rate to 12, Response Lag to 4 and 1624 66.2 Binomial Design – 66.2.1 The CAPTURE Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Probability of Dropout to 0.05. We have entered all the parameters required for East to determine the sample size. Click Compute to obtain a preview of the output. East has determined that a sample size of 1456 subjects is required to attain a power of 0.8. The trial is expected to be approximately 125 weeks long. Let us simulate this trial to explore its enrollment timeline. Rename the design ‘CAPTURE’ using the tool in the Output Preview pane. It will then appear in the Library pane on the left-hand side of the East interface in a workbook named ‘Wkbk1’, which also you can rename as ‘CAPTURE’. 66.2.2 Simulating the CAPTURE Trial The primary input in the simulation is the enrollment plan which contains the following information for each site: Site initiation period: the time period over which the site is expected to be initialized so that it is ready to begin enrolling subjects Site accrual rate: the number of subjects expected to arrive at the site over the unit of time chosen (in this case, ‘week’) 66.2 Binomial Design – 66.2.2 Simulating the CAPTURE Trial 1625 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Enrollment cap: the maximum number of subjects that may be enrolled at the site. This enrollment cap also applies to the entire study. This means that no single site or all the sites put together can enroll more than this enrollment cap. The table below shows a sample enrollment plan for the CAPTURE trial. Under this enrollment plan, Site1 initiates immediately and the remaining 19 sites must initiate within 10 weeks of the start time. The accrual rates are given per site per week and sum up to the overall accrual rate of 12. The enrollment cap of each site is set to the estimated total sample size of the study. We shall simulate the CAPTURE trial using this enrollment plan. To access the simulation tool select ‘CAPTURE’ in the Library pane and click . This opens the simulation input dialog box containing four tabs: Test Parameters, Response Generation, Accrual/Dropouts and Simulation Controls. The Simulation Controls tab is where we specify the number of simulation runs. The remaining three tabs contain the trial details we had entered in the initial design phase. Click on Include Options in the upper right-hand corner and select Site to add information about the number of sites and their enrollment parameters. Accrual Model We have the choice to specify whether the arrival times of subjects are to be sampled under a uniform model or a Poisson model. Under the uniform model, 1626 66.2 Binomial Design – 66.2.2 Simulating the CAPTURE Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the arrival times of subjects are sampled from a uniform distribution over the given time interval. The Poisson model assumes that subjects arrive according to a Poisson process and thus their inter-arrival times are sampled from an exponential distribution. Let us use the Poisson accrual model as it is known to be a more realistic representation of the subject arrival process. Enrollment Plan We must choose whether to specify the enrollment plan by region or by site. Under Sites by Region East assumes that all sites within a region have the same parameters (site initiation periods, accrual rates and enrollment caps), while selecting Sites allows us to specify enrollment parameters individually. Let us specify the CAPTURE enrollment plan by Sites. Enter the parameters in the enrollment plan grid either manually or by creating a spreadsheet such as the one shown below, saving it as a CSV file, import it using Home−− >Imports menu item to appear as a node with extension .cydx, and then select it using the Specify Enrollment Plan... button. For your convenience this CSV file is already created and stored in the Samples subfolder in your East installation folder, under the name EnrollmentPlan CAPTURE 3 weekly.csv. In this CSV file, ‘SiteID’ corresponds to the site name, ‘SIPstart’ refers to the Site Initiation Start and ‘SIPend’ to Site Initiation End. ‘Arate’ and ‘Ecap’ refer to the site accrual rate and enrollment cap respectively. You may import this CSV file by clicking on Home−− >Import menu item and choosing the CSV file from Samples subfolder. This imported CSV 66.2 Binomial Design – 66.2.2 Simulating the CAPTURE Trial 1627 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) file will appear as a node under CAPTURE workbook with the extension .cydx which is the format for East data files. Click on Specify Enrollment Plan. . . button and specify the workbook and the imported CSV file, now with the extension .cydx. After selecting the .cydx file, use the dropdown boxes in the Choose Variable panel to match the header names in your .cydx file to the column names shown in the East interface. Using the names in our .cydx file the final Specify Enrollment Plan window would appear as follows: After clicking OK the grid should contain the CAPTURE enrollment plan and the complete Accrual/Dropout tab should appear as shown below: 1628 66.2 Binomial Design – 66.2.2 Simulating the CAPTURE Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Set the number of simulations to 1000 in the Simulation Control Info tab; select Random Number Seed as Fixed equal to 12345 and then simulate the design by clicking the Simulate button in the lower right-hand corner. Once the simulation is complete and we close the simulating window a one-line summary of the output is shown in the Output Preview pane: Click on the summary, rename it ‘CAPTURE’ using the clicking on the button and then save it by button. It will appear as a sub-node of the design in ‘Wkbk1’ in the Library pane on the left-hand side of the East interface along with four spreadsheets containing detailed information from the simulation runs. Note that East uses the blue icon denote simulations. 66.2.3 to denote designs and the brown icon to Output The Library pane contains all the output from the simulation of the CAPTURE trial. The general summary is accessed by double-clicking on the ‘CAPTURE’ simulation. The first table, Average Sample Size, Dropouts and Look Times, shows us the average over all 1000 simulations of the sample size (the number of subjects enrolled in the study), completers (subjects who completed the one year period till follow-up), dropouts (subjects who dropped out of the study) and pipeline (subjects who enrolled but did not complete or drop out of the system formally). In addition it provides the 66.2 Binomial Design – 66.2.3 Output 1629 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) average look time for all three looks. For instance, the first look took place on average at around 49.0 weeks, the second look at 89.4 weeks and the final look at 129.428 weeks. These interim looks are approximately 40 weeks apart, reflecting the equally spaced look times we specified in the Boundary Info tab. In the table Simulation Boundaries and Boundary Crossing Probabilities we can see the efficacy boundary and number of completers at each look. By end of the study the null hypothesis was rejected in 818 out of 1000 simulations, resulting in the power of 0.818. East aggregates the data contained in these data files to generate plots showing the enrollment process over time. These plots can be accessed by clicking the button in the Library pane. The Enrollment Prediction Plot displays the number of enrollments against time and shows us how long it is expected to take for the target number of enrollments to be reached. In this case the predicted median enrollment of 1456 was completed at 1630 66.2 Binomial Design – 66.2.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 around 129 weeks, closer to the initial computation of 125 weeks. In the Completers Prediction Plot we can see the number of completers over time. While the number of subjects is lower due to dropouts, the plot itself is very similar to the Enrollment Prediction Plot owing to the relatively short response lag of 4 weeks. Lastly, the Dropout Prediction Plot shows the cumulative number of dropouts over the accrual duration and indicates that the median number of dropouts at the end of the 66.2 Binomial Design – 66.2.3 Output 1631 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) trial was about 73. For all the plots, the axes and labels can be adjusted using the which invokes the Chart Settings menu: button Finally, East produces four files containing the full data generated in the simulations. These files, named SummaryStat, SubjectData, SiteSummary, and SiteData are the source of the data displayed in the tables and plots described above and can be accessed from the Library. 1632 66.2 Binomial Design – 66.2.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 66.3 Survival DesignExample 1 66.3.1 The RALES Trial: Initial Design 66.3.2 Simulating the RALES Trial 66.3.3 Output The next example is based on the RALES trial described in Chapter 43. The aim of this trial was to compare survival in two groups: a treatment group receiving Aldactone for heart failure, and a control group. As in the previous examples, we extend the RALES simulation to incorporate accrual information and study the enrollment and events prediction. 66.3.1 The RALES Trial: Initial Design Aldactone was developed to treat patients with severe heart failure. The randomized aldactone evaluation study (RALES) was a six-year double blind trial comparing survival rates of a treatment group that was administered Aldactone and a control group that received a placebo. The placebo group was known to have a mortality rate of 38%, and the aim of RALES was to ascertain with a power of 0.9 whether Aldactone was successful in reducing that mortality rate by 17% (from 38% to 31.54%) in the treatment group. The study was a two-sided test with α = 0.05 and an expected dropout rate of 5% in both groups. Subjects were enrolled over a period of 1.7 years and there were 6 interim looks scheduled over the duration of the study. Suppose we wish to design this trial using ‘months’ as our unit of analysis instead of ‘years’. In that case, the relevant parameters would be adjusted as follows: Accrual rate: 960/12 = 80 subjects/month Accrual duration: 1.7 x 12 = 20.4 months Study duration: 6 x 12 = 72 months Hazard rate (treatment): 0.3154/12 = 0.0263 Hazard rate (control): 0.38/12 = 0.0317 Let us implement this design in East. Click on the Two Sample button in the Survival category on the Design ribbon and select Logrank Test Given Accrual Duration and Study Duration. This opens the survival design dialog box with default values. 66.3 Survival Design-Example 1 – 66.3.1 The RALES Trial: Initial Design 1633 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Enter the following design parameters of the RALES trial in the corresponding fields: Design Type: Superiority Number of Looks: 6 Test Type: 2-Sided Type-1 Error: 0.05 Power: 0.9 Allocation Ratio: 1 The next step is to enter the survival information in the right-hand portion of the Test Parameters tab. Set # of Hazard Pieces to ‘1’ and let Input Method be ‘Hazard Rates’. Fill in Hazard Rate (Control) as ‘0.0317’ and Hazard Rate (Treatment) as‘0.0263’. The Hazard Ratio is then automatically computed as 0.83: In the Boundary tab we specify the details for the Efficacy boundary and the spacing of the interim looks. We keep the default spending function of Lan DeMets (OF). When set to Equal the looks are distributed equidistantly across the (0, 1.000] range of the Info. Fraction. Setting the spacing of looks to Unequal allows us to choose when the interim looks take place by setting the information fractions accordingly. For this example let us assume all looks are equally spaced. 1634 66.3 Survival Design-Example 1 – 66.3.1 The RALES Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the final tab we can enter the accrual and dropout information. Recall that RALES had an accrual duration of 20.4 months (1.7 years) and a total study duration of 72 months (6 years). Enter these values in their respective fields while leaving # of Accrual Periods as ‘1’. Also, in the RALES trial 5% of the subjects are expected to drop out. This can be specified in the Piecewise Dropout Information panel either in terms of hazard rates or probability. Achieving the 5% dropout is a trial and error process as described in Chapter 50. Set # of Pieces to ‘1’ and Input Method to ‘Prob. of Dropout’. Set both Prob. of Dropout (Control) and Prob. of Dropout (Treatment) to 0.05 and initially set By Time to 12 (months). The final Accrual/Dropout tab should appear as follows: 66.3 Survival Design-Example 1 – 66.3.1 The RALES Trial: Initial Design 1635 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Click Compute to determine the required accruals and events for this trial. Rename this design ‘RALES’ using and then save it in the library using us simulate this trial to study its enrollment process. 66.3.2 . Let Simulating the RALES Trial The primary input in the simulation is the enrollment plan which contains the following information for each site: Site initiation period: the time period over which the site is expected to be initialized so that it is ready to begin enrolling subjects Site accrual rate: the number of subjects expected to arrive at the site over the unit of time chosen (in this case, ‘month’) Enrollment cap: the maximum number of subjects that may be enrolled at the site. This enrollment cap also applies to the entire study. This means that no single site or all the sites put together can enroll more than this enrollment cap. 1636 66.3 Survival Design-Example 1 – 66.3.2 Simulating the RALES Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The table below shows a sample enrollment plan for the RALES trial. We see from this enrollment plan that there are 20 sites participating in the study and each site may enroll a maximum of 1000 subjects. Site 1 initiates immediately and the remaining 19 sites must initiate within 1 month of the start of the study. The accrual rates are given in terms of subjects arriving per site per month and sum up to the overall monthly accrual rate of 80. We shall use this enrollment plan in our simulation of the RALES trial. Select ‘RALES’ in the Library pane and click to open the simulating design window containing the tabs Simulation Parameters, Response Generation, Accrual/Dropouts, and Simulation Controls. The first three tabs contain the trial details we had entered in the initial design phase. The Simulation Controls tab is 66.3 Survival Design-Example 1 – 66.3.2 Simulating the RALES Trial 1637 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) where we specify the number of runs. Click on the Include Options box to add Site : The main inputs we must provide in the Accrual/Dropouts tab are the accrual model and the enrollment plan. Accrual Model We have the choice to specify whether the arrival times of subjects are to be sampled from a uniform distribution or from an exponential distribution under the Poisson process. Let us use the Poisson accrual model as it is known to be a more realistic representation of the subject arrival process. Furthermore, let us specify the enrollment plan in terms of Sites; when we specify in terms of Sites by Region it is assumed that all sites within a region have the same parameters, which is not the case 1638 66.3 Survival Design-Example 1 – 66.3.2 Simulating the RALES Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 in our enrollment plan. Enrollment Plan Enter the parameters of the RALES enrollment plan in the grid manually. Alternatively, create a spreadsheet such as the one shown below and save it as a CSV file, import it using the menu item Home-->Import to add it as a node with the extension .cydx, and then select it using the Specify Enrollment Plan... button. For your convenience this CSV file is already created and stored in the Samples subfolder in your East installation folder, under the name EnrollmentPlan RALES.csv. In this CSV file, ‘SiteID’ corresponds to the site name, ‘SIPstart’ refers to the Site Initiation Start and ‘SIPend’ to Site Initiation End. ‘Arate’ and ‘Ecap’ refer to the site accrual rate and enrollment cap respectively. Click the Specify Enrollment Plan... button to load the file into the enrollment plan grid using Browse: 66.3 Survival Design-Example 1 – 66.3.2 Simulating the RALES Trial 1639 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Ensure that the header names in your CSV file match the column names indicated in the Specify Enrollment Plan window by selecting the corresponding variable names in the dropdown boxes: Click OK. When the final Accrual/Dropouts tab appears as displayed below we can set the number of simulations to 1000 in the Simulation Control tab and then simulate the design by clicking Simulate. East displays the following window as it carries out the simulation runs: 1640 66.3 Survival Design-Example 1 – 66.3.2 Simulating the RALES Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Once the specified number of simulations has been run we can close the simulating design window and see a one-line summary of the output in the Output Preview pane: Save the output from the Output Preview pane using Note that East uses the blue icon denote simulations. to denote designs and the brown icon 66.3 Survival Design-Example 1 – 66.3.3 Output to 1641 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) 66.3.3 Output Double-click on the ‘RALES’ simulation node in the Library pane to open the output summary. Here we can see data such as the estimations of the average sample size, number of events and dropouts at each look. In the table Simulation Boundaries and Boundary Crossing Probabilities we observe that by the end of the trial in 906 out of 1000 simulations we are able to reject the null hypothesis that the hazard rates of the treatment and control group are equal. In other words, Aldactone was effective in reducing the mortality rate by 17% as hypothesized. Click on the Plot. button in the Library pane and select Enrollment Prediction The Enrollment Prediction Plot displays the cumulative enrollments over time. It shows the predicted median and average enrollments along with the 95% confidence 1642 66.3 Survival Design-Example 1 – 66.3.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 interval over all simulations. From our simulation of the RALES trial, it is expected that the full sample size will be enrolled earliest by about 20 months and latest by about 22 months. Furthermore, the confidence interval band is fairly narrow; indicating that there is not expected to be a great degree of variation in the predicted enrollment. In the Events Prediction Plot we can observe the timeline of the events throughout the study period of around 72 months and beyond, while the Enrollment Prediction Plot only covered the accrual duration of about 20 months. From the graph, we may conclude that it is likely that the study will take the estimated length of a median of about 73 months and a maximum of 79 months. 66.3 Survival Design-Example 1 – 66.3.3 Output 1643 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Lastly, the Dropouts Prediction Plot shows the progression of dropouts over the study duration. The predicted median dropouts by the end of the study period of 72 months is about 182, with the 95% confidence interval spanning a range of 158 to 208. Lastly, four output files nested below the ‘RALES’ simulation node in the Library pane contain the full details of all the simulation runs. These files, named SummaryStat, SubjectData, SiteSummary, and SiteData, are the source of the data displayed in the tables and plots described above. SummaryStat contains the look-wise details of each of the 1000 simulation runs including the number of accruals, completers, dropouts, look times, average follow-up 1644 66.3 Survival Design-Example 1 – 66.3.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 times and so on. The SubjectData sheet displays the following data corresponding to each subject: ScenarioID: it is possible to simulate a design under different scenarios by entering multiple parameter values in certain fields (refer to Section 3.7). East then assigns an identification number to each scenario. In this example we simulated a single design without varying the parameters and so the ScenarioID is always ‘1’. SimulationID: the identification number of the simulation. PatientID: a unique identification number assigned to each subject. SiteID: the identification number of the site at which the subject arrived. ArrivalTime: the time at which the subject arrived. TreatmentID: the type of treatment the patient received. SurvivalTime: the observed survival time of the subject over the course of the study duration DropOutTime: the time at which the subject dropped out of the study. CensorInd: this variable corresponds to censoring information. ‘1’ represents completers and ‘0’ represents dropouts and subjects in the pipeline. SiteSummary contains the site-level data: SiteID: the identification number of the site. RegionID (if applicable): the ID of the region to which the site belongs. In this 66.3 Survival Design-Example 1 – 66.3.3 Output 1645 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) example there is no RegionID column because we chose to define the enrollment plan by individual sites. AvgInitiationTime: the average initiation time of this site over all simulations in which this site was opened. AvgLastSubjectArrTime: the average time at which the last subject was enrolled at this site over all simulations in which this site was opened. AvgNumOfSubj: the average number of subjects enrolled at this site over all simulations in which this site was opened. AvgAccrualDuration: the average of the accrual duration for the site computed in every simulation in which the site was opened. The accrual duration is calculated as the last subject randomization time - site initiation time of that site. AvgAccrualRate: the average of the observed accrual rate computed in each simulation in which the site was opened. SiteOpenedSimCount: the number of simulations in which the site was opened. The final output file, SiteData, contains the following information for each site: SimulationID: the identification number of the simulation. SiteOpenFlag: indicates whether the site has been initiated. The flag is ‘1’ if the site has been initiated and ‘0’ if it has not. SiteID: the identification number of the site. RegionID (if applicable): the ID of the region to which the site belongs. In this example there is no RegionID column because we chose to define the enrollment plan by individual sites. SiteReadyTime: the site initiation time generated as part of the simulations. SiteAccrRate: the site accrual rate specified in the enrollment plan. SubjectsAccrued: the number of subjects accrued at the site. LastSubjectRand: the randomization time of the last subject arriving at the site. AccrualDuration: if SiteOpenFlag = 1 for the ith site the accrual duration is computed as follows: AccrualDuration = maximum of the LastSubjRand times 1646 66.3 Survival Design-Example 1 – 66.3.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 across all sites - SiteReadyTime of the ith site. If SiteOpenFlag = 0 for the site then the AccrualDuration field will be blank. ObsrvdAccrualRate: the observed accrual rate for the site. It is computed as follows: ObsrvdAccrualRate = SubjectsAccrued/AccrualDuration. 66.4 Survival DesignExample 2 The final example is based on the ONCOX time to event trial. The aim of this trial was to compare survival in two groups: a treatment group receiving a new drug for cancer, and a control group. As in the previous examples, we extend the ONCOX simulation to incorporate accrual information and study the enrollment and events prediction. 66.4.1 The ONCOX Trial: Initial Design The randomized ONCOX study was a 30 months double blind efficacy and futility trial comparing survival rates of a treatment group and a control group with one interim look. The control group was known to have a median survival period of 5 months and the aim of ONCOX was to ascertain with a power of 0.9 that the median survival in the treatment group would be a longer period of 7 months. The study was a one-sided test with α = 0.025 and an expected annualized dropout rate of 4% in both the groups. The efficacy and futility boundaries were to be based on spending function of γ(−5). Subjects were enrolled over a period of 24 months. The sample size was fixed to be 460. Let us implement this design in East. Click on the Two Sample button in the Survival category on the Design ribbon and select Logrank Test Given Accrual Duration and Study Duration. This opens the survival design dialog box with default values. Enter the following design parameters of the ONCOX trial in the corresponding fields Design Parameters tab: Design Type: Superiority 66.4 Survival Design-Example 2 – 66.4.1 The ONCOX Trial: Initial Design 1647 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Number of Looks: 2 Test Type: 1-Sided Type-1 Error: 0.025 Sample Size: 460 Power: (to be computed) No.of Events (to be computed) # of Hazard Pieces: 1 Median Survival Time Input Method: Median Survival Time (Control): 5 Input Method: Median Survival Time (Treatment): 7 Allocation Ratio: 1 The dialog box will appear as shown below. Notice that the hazard ratio is computed to 0.714. Next, in the Boundary tab we specify the details for the Efficacy boundary, Futility boundary, and the spacing of the interim looks. As indicated in the beginning of this chapter, we modify the spending function from the default Lan DeMets (OF) to Gamma family(-5). Set the spacing of looks as Equal. Futility boundary Non-binding Gamma family with parameter -5 is chosen. 1648 66.4 Survival Design-Example 2 – 66.4.1 The ONCOX Trial: Initial Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now the Test Parameters tab will appear as shown below. In the final tab, we can enter the accrual duration (24 months), study duration (30 months) and dropout information (prob.of dropout as 0.04 in a 12 month survival period. Now the Accrual/Dropouts tab should appear as follows: Click Compute to determine the required events and the power attained for this trial. Rename this design ‘ONCOX’ using and then save it in the library using Let us simulate this trial to study its enrollment process. 66.4 Survival Design-Example 2 – 66.4.2 Simulating the ONCOX Trial . 1649 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) 66.4.2 Simulating the ONCOX Trial The primary input in the simulation is the enrollment plan which contains the following information for each site: Site initiation period: the time period over which the site is expected to be initialized so that it is ready to begin enrolling subjects Site accrual rate: the number of subjects expected to arrive at the site over the unit of time chosen (in this case, ‘month’) Enrollment cap: the maximum number of subjects that may be enrolled for the entire country as well for each site within the country. Thus, one site in a country can enroll number of subjects equal to the cap, provided all other sites in the country enroll none. The table below shows a sample enrollment plan for the ONCOX trial. We see from this enrollment plan that there are 14 countries each with different number of sites, participating in the study and each site may enroll a maximum of the number of subjects specified as ‘Enrollment Cap’. Sites in US initiates immediately and the remaining sites in remaining 13 countries must initiate within a maximum of 8 months of the start of the study. The accrual rates are given in terms of subjects arriving per site per month. We shall use this enrollment plan in our simulation of the ONCOX trial. to open the simulating Select ‘ONCOX’ design in the Library pane and click design window containing the tabs Test Parameters, Response Generation, Accrual/Dropouts, and Simulation Controls. The first three tabs contain the trial details we had entered in the initial design phase. The Simulation Controls tab is 1650 66.4 Survival Design-Example 2 – 66.4.2 Simulating the ONCOX Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 where we specify the number of runs. Click on the Include Options box to add Site Info: The main inputs we must provide in the Accrual/Dropouts tab are the accrual model and the enrollment plan. Accrual Model We have the choice to specify whether the arrival times of subjects are to be sampled from a uniform distribution or from an exponential distribution under the Poisson process. Let us use the Poisson accrual model as it is known to be a more realistic representation of the subject arrival process. Furthermore, let us specify the enrollment plan in terms of Sites by Regions; when we specify in terms of Sites by 66.4 Survival Design-Example 2 – 66.4.2 Simulating the ONCOX Trial 1651 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Region it is assumed that all sites within a region have the same parameters. Enrollment Plan Enter the parameters of the ONCOX enrollment plan in the grid manually. Alternatively, create a spreadsheet such as the one shown below and save it as a CSV file so that it can be imported using the menu item Home-->Import to appear as a node with extension .cydx. For your convenience this CSV file is already created and stored in the Samples subfolder in your East installation folder, under the name EnrollmentPlan ONCOX.csv. In this CSV file, the column titles are self explanatory. Click the Specify Enrollment Plan... button to specify the .cydx file and get it into the enrollment plan grid. Ensure that the header names in your CSV file which is now a .cydx file, match the column names indicated in the Specify Enrollment Plan dialog box by selecting the 1652 66.4 Survival Design-Example 2 – 66.4.2 Simulating the ONCOX Trial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 corresponding variable names in the dropdown boxes: Click OK. When the final Accrual/Dropouts tab appears as displayed below we can set the number of simulations to 1000 in the Simulation Controls tab. Fix the seed at 12345 and then simulate the design by clicking Simulate. East displays the following window as it carries out the simulation runs: 66.4 Survival Design-Example 2 – 66.4.2 Simulating the ONCOX Trial 1653 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) Once the specified number of simulations has been run we can close the simulating design window and see a one-line summary of the output in the Output Preview pane: Save the simulation output from the Output Preview pane to the library. Note that East uses the blue icon simulations. 66.4.3 to denote designs and the brown icon to denote Output Double-click on the ‘ONCOX’ simulation node in the Library pane to open the output summary. Here we can see data such as the estimations of the average sample size, number of events and dropouts at each look. In the table Simulation Boundaries and Boundary Crossing Probabilities we observe that by the end of the trial in 900 out of 1000 simulations we are able to reject the null hypothesis that the hazard rates of the 1654 66.4 Survival Design-Example 2 – 66.4.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 treatment and control group are equal, matching with the 90% power of the study. Click on the Plot. button in the Library pane and select Enrollment Prediction The Enrollment Prediction Plot displays the cumulative enrollments over time. It shows the predicted median and average enrollments along with the 95% confidence 66.4 Survival Design-Example 2 – 66.4.3 Output 1655 <<< Contents 66 * Index >>> Enrollment/Events Prediction - At Design Stage (By Simulation) interval over all simulations. From our simulation of the ONCOX trial, it is expected that the full sample size will be enrolled earliest by about 105 months and latest by about 125 months. In the Events Prediction Plot, we may observe that it is likely that the study will take the targeted median of 374 events in about 105 months and latest by 112 months with 95% confidence. Lastly, the Dropouts Prediction Plot shows the progression of dropouts over the study 1656 66.4 Survival Design-Example 2 – 66.4.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 duration. Lastly, four output files nested below the ‘ONCOX’ simulation node in the Library pane contain the full details of all the simulation runs. These files, named SummaryStat, SubjectData, SiteSummary, and SiteData, are the source of the data displayed in the tables and plots described above. 66.4 Survival Design-Example 2 1657 <<< Contents * Index >>> 67 Conditional Simulation During the design stage in the previous chapter we used simulation to explore the enrollment timeline and event prediction. The inputs of the design stage simulations were based on estimates of accrual rates and other parameters. In the case of Survival designs, once the trial begins and we obtain data on the realized enrollments, we can use the interim monitoring (IM) feature of EastPredict to update the parameters and generate new predictions about the enrollment and event timelines. 67.1 Survival DesignExample 1 67.1.1 Interim Data Preparation 67.1.2 Interim Analysis 67.1.3 Simulation 67.1.4 Output The simulation of the RALES trial in the previous chapter indicated a required sample size of 1638 subjects with an expected accrual period of around 20 months. The total duration of the study was around 72 months. This example continues from the unconditional simulation performed in the previous chapter and assumes that the Rales.cywx workbook which is available in the Samples folder is open in East. In this section we perform a conditional simulation at the first interim look. 67.1.1 Interim Data Preparation Data preparation for conditional simulation involves compiling the required data from various sources at a certain cut-off point as described below: Subject Data Subject data refers to information collected about each subject accrued so far, namely: Arrival time: the time at which the subject arrived at the site. Censor information: whether the subject is a completer, a dropout or still in the pipeline. Treatment information: whether the subject was randomized to the treatment arm or the placebo arm. Survival information: the survival time of the subject. For our example, we prepare the data on the basis of a simulated trial which was the output of our design time (unconditional) simulation. The file RALES iLook1 SubjectData contains a list of subjects accrued so far and the following data for each subject: ArrivalTime: the time at which the subject arrived. TreatmentID: a variable indicating which group the subject was randomized to (‘1’ for treatment, ‘0’ for placebo). TimeOnStudy: the length of time the subject has been in the study, corresponding to survival time. 1658 67.1 Survival Design-Example 1 – 67.1.1 Interim Data Preparation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 CensorIndicator: a variable indicating whether the subject is a completer (‘1’), a dropout (‘-1’) or in the pipeline (‘0’). CensorInd: a variable indicating whether the subject is a completer (‘1’) or a non-completer (‘0’). A non-completer can be either a dropout or in the pipeline. A portion of the RALES iLook1 SubjectData file is shown below: For your convenience the Rales ilook1 subjectdata.csv has already been created and available in Samples subfolder in your East installation folder. This data file can be imported into East using the Import button in the Home ribbon: Once imported, the file will appear in the Library pane as a node in the active workbook, with extension .cydx. 67.1.2 Interim Analysis Click on the node RALES iLook1 SubjectData.cydx and choose the menu item Analysis>Two Samples>Logrank. In the resulting dialog box, select the 67.1 Survival Design-Example 1 – 67.1.2 Interim Analysis 1659 <<< Contents 67 * Index >>> Conditional Simulation variables as shown in the screen shots below. Click OK to see the following output. We will use these output values for observed response frequencies to enter into the Test Statistic Calculator. 67.1.3 Simulation To open the IM design window, select the ‘RALES’ design (represented by the blue 1660 67.1 Survival Design-Example 1 – 67.1.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 icon ) in the Library pane and click . This opens the IM dashboard: Click on the first blank row in the upper panel corresponding to Look #1 and click on the button. This invokes the Test Statistic Calculator for recalculating the test statistic value based on the interim data. We saw from the interim analysis, the first look was taken at 206 events and the data was as follows: Estimate of δ = ln(0.743) =-0.29706 and Standard Error of Estimate of δ = sqrt(4/206) = 0.139347. Enter these results in the relevant fields and then click on Recalc to obtain 67.1 Survival Design-Example 1 – 67.1.3 Simulation 1661 <<< Contents 67 * Index >>> Conditional Simulation the updated test statistic: The test statistic is updated to -2.132. After clicking OK the table in the dashboard is updated according to the new information: The next step is to enter the observed data for the first look. In the IM Dashboard select the first row corresponding to Look #1 and click the dialog window. button. This opens an input Specify Subject Info In this pane, use the drop-down menus next to Select Workbook and Select Subject Data to select the active workbook and the 1662 67.1 Survival Design-Example 1 – 67.1.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 RALES iLook1 SubjectData file we imported earlier. Next, in the Choose Variables tab, match the variables names to the corresponding headers in RALES iLook1 SubjectData using the drop-down menus. In our example the matching would appear as follows: Population ID = TreatmentID Control = 0 Treatment = 1 Status Indicator = CensorIndicator Arrival Time = ArrivalTime Time on Study = TimeOnStudy Click on OK to obtain the input dialog window for conditional simulation: The simulation input dialog window consists of four tabs: Test Parameters, Response Generation, Accrual/Dropouts, and Simulation Controls. The first three tabs contain the parameters specified in the previous step. Navigate to the Accrual/Dropouts tab. Note that the parameters are estimated from the subject data 67.1 Survival Design-Example 1 – 67.1.3 Simulation 1663 <<< Contents 67 * Index >>> Conditional Simulation RALES iLook1 SubjectData.cydx Lastly, in the Simulation Controls tab set the number of simulations to 1000, select the Fixed Random Seed 12345, check all the output options to save the data 1664 67.1 Survival Design-Example 1 – 67.1.3 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Simulate. After the simulation is complete the results appear in the Library as a sub-node of the RALES design: Under this sub-node, there is a snapshot of the initial interim data entered (Snap 1.1), an output summary of the conditional simulation (CS:Sim1) and the updated versions of the output files generated during the initial (unconditional) simulation. 67.1.4 Output Double-click ‘CS:Sim1’ in the Library to open a detailed summary of the conditional simulation. The first and third tables, Actuals: Sample Size and Look Times and Actuals: Events and Boundaries, contain the interim data pertaining to the first look (1205 subjects accrued, 206 events out of which 116 were in the control group and 90 were in the treatment group) and the time of the interim look (15.561 months). The second table named Conditional Simulation: Average Sample Size and Look Times displays the projections of these parameters for the remaining five looks. From the fourth table, we can see the boundary crossing probabilities in the remaining five looks. For instance, by the 3rd look, 619 events have been observed and the efficacy boundary has been crossed in 671 out of 1000 simulations. 67.1 Survival Design-Example 1 – 67.1.4 Output 1665 <<< Contents 67 67.2 * Index >>> Conditional Simulation Survival DesignExample 2 The ONCOX trial in the previous chapter was designed with a sample size of 460 subjects with an expected accrual period of around 24 months and targeted 374 events within a study period of around 30 months. This example continues from the unconditional simulation performed in the previous chapter and assumes that the workbook OncoX.cywx is open in East. You may open it from the Samples folder. In this section we perform a conditional simulation at the first interim look. 67.2.1 Interim Data Preparation Data preparation for conditional simulation involves compiling the required data from various sources at a certain cut-off point. The data required is Subject Data which consist of the following information. Subject Data Subject data refers to information collected about each subject accrued so far, namely: Country: Country ID. Arrival time: the time at which the subject arrived at the site. Censor information: whether the subject is a completer, a dropout or still in the pipeline. Treatment information: whether the subject was randomized to the treatment arm or the control arm. 1666 67.2 Survival Design-Example 2 – 67.2.1 Interim Data Preparation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Survival information: the survival time of the subject. The file ONCOX iLook1 SubjectData contains a list of subjects accrued so far and the following data for each subject: Country: Country ID. ArrivalTime: the time at which the subject arrived. TreatmentID: a variable indicating which group the subject was randomized to (‘1’ for treatment, ‘0’ for placebo). TimeOnStudy: the length of time the subject has been in the study, corresponding to survival time. Status: a variable indicating whether the subject is a completer (‘1’), a dropout (‘-1’) or in the pipeline (‘0’). Censor: a variable indicating whether the subject is a completer (‘1’) or a non-completer (‘0’). A non-completer can be either a dropout or in the pipeline. A portion of the ONCOX iLook1 SubjectData file is shown below: The data file can be imported into East using the Import button in the Home ribbon: Once imported, the file will appear in the Library pane as a node in the active workbook, with extension .cydx. 67.2.2 Simulation To open the IM design window, select the ‘ONCOX’ design (represented by the blue 67.2 Survival Design-Example 2 – 67.2.2 Simulation 1667 <<< Contents 67 * Index >>> Conditional Simulation icon ) in the Library pane and click . This opens the IM dashboard: First we need to compute hazard ratio from the interim subject data. For this, click on ‘ONCOX iLook1 SubjectData.cydx’ node in the library and then click on Analysis > Two Samples > Parallel Design > Logrank menu item. In the resulting dialog box fill up items as shown below and click OK. 1668 67.2 Survival Design-Example 2 – 67.2.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Now you will get the following results. The number of events is 187 and the estimated hazard ratio is 0.743. Now go to IM dashboard and click on the first blank row in the upper panel corresponding to Look #1 and click on the button. This invokes the Test Statistic Calculator for recalculating the test statistic values based on the interim data. Enter the cumulative events as 187, Estimate of δ as ln(0.743), and Standard Error as 67.2 Survival Design-Example 2 – 67.2.2 Simulation 1669 <<< Contents 67 * Index >>> Conditional Simulation sqrt(4/187). Click Recalculate. You will see the Test Statistic computed as -2.031. Click OK and the results will get posted in the IM dashboard as shown below. 1670 67.2 Survival Design-Example 2 – 67.2.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the IM Dashboard select the first row corresponding to Look #1 and click the button. This opens an input dialog window. Specify Subject Info In this pane, use the drop-down menus next to Select Workbook and Select Subject Data to select the active workbook and the ONCOX iLook1 SubjectData file we imported earlier. Next, in the Choose Variables tab, match the variables names to the corresponding headers in ONCOX iLook1 SubjectData using the drop-down menus. In our example the matching would appear as follows: Population ID = TreatmentID Control = 0 Treatment = 1 Status Indicator = Status Arrival Time = ArrivalTime Time on Study = TimeOnStudy Click on OK to obtain the input dialog window for conditional simulation: The simulation input dialog window consists of four tabs: Test Parameters, Response Generation , Accrual/Dropouts and Simulation Controls. The first three tabs contain the parameters specified in the previous step. Navigate to the 67.2 Survival Design-Example 2 – 67.2.2 Simulation 1671 <<< Contents 67 * Index >>> Conditional Simulation Accrual/Dropouts tab. Note that the parameters are estimated from the subject data. Lastly, in the Simulation Controls tab set the number of simulations to 1000, select the Fixed Random Seed 12345, check all the output options to save the data. 1672 67.2 Survival Design-Example 2 – 67.2.2 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Simulate. After the simulation is complete the results appear in the Library as a sub-node of the ONCOX design: Under this sub-node, there is a snapshot of the initial interim data entered (Snap 1.1), an output summary of the conditional simulation (CS:Sim1) and the updated versions of the output files generated during the initial (unconditional) simulation. 67.2.3 Output Double-click ‘CS:Sim1’ in the Library to open a detailed summary of the conditional simulation. The first table, Actuals: Sample Size and Look Times, contains the interim data pertaining to the first look (402 subjects accrued, 187 events out of which 106 were in the control group and 81 were in the treatment group) and the time of the interim look (20.724 months). The second table named Conditional Simulation: Average Sample Size and Look Times displays the projections of these parameters for the second and last look. 67.2 Survival Design-Example 2 – 67.2.3 Output 1673 <<< Contents 67 * Index >>> Conditional Simulation The table Simulation Boundaries and Boundary Crossing Probabilities shows the number of simulations in which the efficacy boundary is crossed at each look. For instance, by the second look, 374 events have been observed and the efficacy boundary has been crossed in 864 out of 1000 simulations. 1674 67.2 Survival Design-Example 2 <<< Contents * Index >>> 68 Enrollment/Events Prediction Analysis Prediction is useful even in fixed sample trials, that is, trials in which the user is not interested in stopping early for efficacy or futility. Even in such trials, the user or authorized person(s) may have access to the interim subject and site data or at least the summarized trial data and may want to predict the future enrollment and event milestones in the trial. There may be situations where a group sequential trial might not have been designed using East, or might not possess an access to Interim Monitoring module of East. The investigator is still interested in predicting the Accrual Duration and Study Duration based on an interim subject data. Catering to the needs of all such studies, the Predict feature is developed in the current version of East. We make the prediction functionality available through Analysis menu. During the design stage in chapter 66 we used simulation to explore the enrollment timeline and event prediction of four trials: Orlistat (normal design), CAPTURE (binomial design) and RALES and ONCOX (survival designs). The inputs for the design stage simulations were based on estimates of accrual rates and other parameters. Once the trial begins and we obtain data on the realized enrollment, we can use the Predict module to update the parameters and generate new predictions about the enrollment and event timelines. In this chapter, we introduce the Predict feature available in Analysis menu of East 6.4 and demonstrate its use for normal, binomial and survival designs considering data arising from the respective studies. The Predict feature in Analysis can play a vital role in assisting the Data Monitoring Committee (DMC) statistician as well as sponsor statistician in the following manner. A DMC statistician typically has access to unblinded trial data. With this, she can use Predict feature to forecast how long the subject enrollment is likely to take and how long the study will take to complete by predicting the time by which required number of events would be achieved separately on the treatment and control drug. The sponsor statistician generally has access to the blinded trial data. She can use Predict feature to forecast enrollment duration as well as study duration based on the available blinded subject and/or events data. The option of providing summary data as input makes the use of Predict feature possible whenever individual subject data are not available. The Summary Data may consist of information on number of subjects enrolled, number of events occurred, number of drop outs observed so far etc. In addition to these, estimates of parameters 1675 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis such as hazard rates for events, hazard rates of drop outs also might be available based on the interim or prior data. In this chapter, we will use four examples for three endpoints - normal (Orlistat trial), binomial (Capture trial), and survival (Rales trial and Oncox trial) to illustrate enrollment/events prediction procedures. The main purpose of these procedures is to predict at any time point of the study, the likely cumulative enrollment for normal, binomial and survival studies and events/dropouts for survival studies. 68.1 Enrollment Only 68.1.1 Subject-level Data 68.1.2 Subject Data with Site Information 68.1.3 Summary Data Suppose we have a partial data on enrollments of subjects for the Orlistat trial described in chapter 11. The trial is still ongoing and we want to predict the time when the target enrollment would be complete. The enrollment data till the current calendar time are stored in the ORLISTAT iLook1 SubjectData.csv file which is available in the Samples folder of East 6.4 installation. 68.1.1 Enrollment Only: Subject-level Data Data preparation for the Enrollment Only menu of Predict involves compiling the required data from various sources at a certain cut-off point. The enrollment can be across number of sites or at a single center. For the Enrollment Only feature, arrival times of the subjects are required. In this illustration, we assume that there is only Subject data available which comprises of the following variables. Subject Data Subject data refers to information collected about each subject accrued so far, namely: PatientID : Subject ID of the patient. Arrival time: the time at which the subject arrived. For our example we prepare the data on the basis of the subjects enrolled so far. The data in ORLISTAT iLook1 SubjectData.csv contains PatientID and Arrival Time. Note that the data contains some additional variables which are not required for this illustration but will be required later. Import the ORLISTAT iLook1 SubjectData.csv file into East using the 1676 68.1 Enrollment Only – 68.1.1 Subject-level Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Import button in the Home ribbon: Once imported, the file will appear in the Library pane as a node in the active workbook, with extension .cydx. Click on the node ORLISTAT iLook1 SubjectData.cydx and choose the menu item Analysis>Predict>Enrollment Only. In the resulting dialog box, select Arrival Time as shown below. 68.1 Enrollment Only – 68.1.1 Subject-level Data 1677 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Since the default value for Input is Subject-level Data we leave it as it is. You may click the View Dataset button to view the data. Click Hide Dataset to restore the dialog. Since we are not considering any Site information, leave the check box Include Site-specific information blank. Click Next. This will invoke the next input dialog, Accrual/Dropouts. You will see some default values already filled in. The Current Sample Size is the number of records (number of arrivals) in the data file which is 212 in this case. The Target Sample Size default value is 318 which is 1.5 ∗ CurrentSampleSize. You may change the Target Sample Size value. This is the value of targeted enrollment in the trial. The objective is to find out on an average how long will the trial take to enroll these many subjects. The Current Calendar Time is accrual time of the last subject in the data which is 2.224. The Accrual Information input is meant for simulating the additional, that is 1678 68.1 Enrollment Only – 68.1.1 Subject-level Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 318 − 212 = 106 accruals. There are two options for Input Method. For the Accrual Rates option, East considers the accrual process comprised of two periods. The first period is assumed to be the one presented in the data. Starting time for this period is assumed to be 0 whereas the accrual rate for this period is computed as (CurrentSampleSize)/(CurrentCalendarT ime). In this case it is 212/2.2243 = 95.31. Both the starting time and Accrual Rate fields are uneditable as these are estimated from the data. The second period is the one which starts after the last accrual in the first period. As a result, the Starting At time default value is the last arrival time in the data which is also the Current Calendar Time. For the second period, the default accrual rate is computed as: (T argetSampleSize − CurrentSampleSize)/(CurrentCalendarT ime) which is (318 − 212)/2.2243 = 47.65499 for the current example. You can edit both the Starting Time and Accrual Rate for the second period. Accrual may vary over time. To reflect this assumption, one can specify the number of time periods, each having different accrual rates. An alternate way to give accrual input is Cum Accrual %. If you choose this option, the input dialog will be As before, East treats the accruals in two pieces. The default value of By Time for the first period is the CalendarT ime while for the second, it is 2 ∗ CalendarT ime. Default values of Accr % for Period 1 and Period 2 are 100 ∗ (CurrentSampleSize/T argetSampleSize) and 100% respectively. Both these values are uneditable. If you choose more than two accrual periods, the table expands and allows you to specify the values of By Time and Accr % fixing the Accr % for the last period to 100%. For this study, let us use the option of Accrual Rates and use the default values. 68.1 Enrollment Only – 68.1.1 Subject-level Data 1679 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Go to the Simulation Controls tab. It will show the following screen. The default number of simulation runs is 10000. Set the number of simulations to 1000 and the Random Number Seed to 12345. The simulation output can be saved either in a .csv file or as a Case Data. You can save Summary Statistics for every simulation run and the Subject level data for a few simulation runs. Suppose we want to save the Summary Statistics and the Subject level data for say, 5 simulation runs. Check both the check boxes and specify 5 simulation runs as indicated in the following screen shot. You can also modify the percentiles values available in the Output for All Trials table. For now, let us keep them as they are. The Simulation Controls dialog will look as shown below: Click the Simulate button available at the bottom. East simulates the arrival of 1680 68.1 Enrollment Only – 68.1.1 Subject-level Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 subjects according to the Poisson Arrival process. After a few seconds East will display the message that Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim1 with sub-nodes for SummaryStat and SubjectData in the Library. Open the SummaryStat data by double clicking the sub node. You will see the 68.1 Enrollment Only – 68.1.1 Subject-level Data 1681 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis following display of data. Observe that for every simulation East calls the Current Sample Size available data as First Interim while the Target Sample Size as Final. The column SUCCESS indicates that the simulation was successful. The variable TotEvents is synonymous to Sample Size. The last column AccrDurtn specifies the accrual duration required to enroll the 318 subjects in the respective simulation run. For instance, in the first simulation, the 318th subject arrived at the time epoch 4.35719 and so on. Now double click the 1682 68.1 Enrollment Only – 68.1.1 Subject-level Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 SubjectData sub node in the library. You will see the following display of data. It shows arrival times of each and every subject in the study for five simulation runs. This is because we have asked to save the data for five runs. The first simulation id is 1. If you scroll down you will be able to see that East has chosen the simulation runs 4, 9,12 and 17 to save the data. This selection is arbitrary on the part of the software. Obviously, if we would have asked for saving data for 1000 runs, East would have chosen all the SimulationIds for saving the data (with the restriction that East can store at the most 100, 000 records.) To view the detailed summary output of the simulations, double click the node 68.1 Enrollment Only – 68.1.1 Subject-level Data 1683 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis PredictSim1 in the Library. The following output is displayed. The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed and the simulated results for the remaining arrivals in the second period. The table Overall Output presents the information on the percentiles of the (simulated) total accrual duration. For example, almost 50% simulations have been completed by 4.447 units of time etc. The mean accrual duration of all the simulations is 4.448. 1684 68.1 Enrollment Only – 68.1.1 Subject-level Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In addition, we can view the enrollment prediction plot using the Library pane: tool in the The Enrollment Prediction Plot displays the timeline of the observed accruals until 2.234 by which in all 212 subjects have been enrolled. This is as per the observed data. After that point it displays the projected enrollments based on the observed accrual data we specified and the revised Accrual Rate in the second period. For example, at year 3 the predicted median enrollments reach the sample size of 249 subjects with 68.1 Enrollment Only – 68.1.1 Subject-level Data 1685 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis 95% Confidence Interval as (237.399, 261.098). Please see the plot below. For the targeted 318 subjects it will take 4.48 units of time as is clear from the following plot. The problem here is that median and upper limit coincide and equal to 318. To envisage the true situation we suggest a workaround to the users. You can rerun the simulations with targeted sample size sufficiently greater that the true targeted sample size. For instance, if you consider the target sample size as say 350, and simulate 1686 68.1 Enrollment Only – 68.1.1 Subject-level Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 keeping rest of the things as previous, you get the Enrollment Prediction Plot as shown below: Try to find out the Time so that the 95% lower limit is around 318 as shown in the picture above. This gives the latest time to get 318 events. From the plot, one can say that the latest time by which 318 accruals will happen is 4.916. In the earlier plot ( for 318 targeted sample size), if you want to find out how long will it take to enroll say, 285 subjects. Select the Input > Enrollments option on the plot and type 285 in the Enrollments textbox as shown in the following plot. 68.1 Enrollment Only – 68.1.1 Subject-level Data 1687 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis From the read-offs it is clear that the median accrual duration for accruing 285 subjects would be 3.755 with 95% confidence interval (3.438, 4.098). 68.1.2 Enrollment Only: Subject-level Data with Site-specific Information In the case of a multi-center trial, accrual rates vary across sites. It is necessary to incorporate this information in the study to come up with a better estimate of total accrual duration on a whole. East provides a way to use this information by accepting the following information on different sites. Suppose for the Orlistat trial we also have the site data stored in a .csv file named ORLISTAT iLook1 SiteData.csv which is available in the Samples folder of East installation directory. Import this file in East 6. Once imported, the file will appear in the Library pane as a node in the active workbook, with extension .cydx. The Site data comprises of the following variables. Site Data Site data refers to information collected about each subject accrued so far, at each of the sites in a multi-center trial. Site ID: Site ID of the site. Site Accrual Rate: Site specific enrollment rate Enrollment Cap: This is the maximum number of subjects the site can enroll. Site Initiation: Unopened Sites – Start Time: It is the time at which the unopened site will open and start accepting accruals. – End Time: It is the time at which the site will stop accepting accruals and close. Site Initiation: Opened Sites – Site Initiation Time: It is the time at which the site was open and started accepting accruals. Click on the node ORLISTAT iLook1 SubjectData.cydx and choose the 1688 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 menu item Analysis>Predict>Enrollment Only. In the resulting dialog box, select the variable ArrivalTime in the drop down for Arrival Time. Since we want to include the Site information for this study, check the Include Site-specific Information check box. As soon as you check this option, the Input screen enables input for Site ID for Subject data as well as some more information about Site data such as workbook, dataset and some variables. Scroll down to see the complete Input dialog. Select Site ID for subject-level data and the data set ORLISTAT iLook1 SiteData.cydx for the input of Site-level Data and map the variables from the Site data to the respective inputs as shown in the following screen which shows the necessary part of the input dialog. 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1689 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Click the button View Dataset to view the Site Data. Click Hide Dataset. Click Next. This will invoke the Accrual/Dropouts Information dialog. The Current Sample Size is 212 which is equal to the number of subjects arrived as per the subject data. East gives two options for generating arrivals either following Poisson process or Uniform. Let us select the option of Poisson arrivals. Go to the Simulation Controls tab. Choose the Random Number Seed as 1690 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Fixed with its value 12345. Check the options for saving simulation outputs in files. Click Simulate. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim3 with subnodes for the simulation outputs. To view the detailed output, double click the node 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1691 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis PredictSim3 in Library. You will see the following output. The Average Accrual Duration across all the simulations is 3.265. The Accrual Duration column in the Overall Summary table indicates the frequency distribution of Accrual Duration. Accordingly, median accrual duration is 3.261, whereas 75% of the simulations have total Accrual Duration 3.335. To view the Enrollments Simulation Plot click the PredictSim1 node in the Library; use the 1692 tool in the Library pane. 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Enrollment Prediction Plot displays the timeline of the observed accruals until 2.226 by which in all 212 subjects have been enrolled. This is as per the observed data. After that point it displays the projected enrollments based on the observed accrual data we specified and the revised Accrual Rate in the second period. For example, at year 2.4 the predicted median enrollments reach the sample size of 229 subjects with the 95% Confidence Interval as (221.438, 238.438). Please see the plot below. 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1693 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis On similar lines, if you want to find out how long will it take to enroll say, 250 subjects. Select the Input> Enrollments option on the plot and type 250 in the Enrollments textbox as shown in the following plot. From the read-offs it is clear that the median accrual duration for accruing 250 subjects would be 2.604 with 95% confidence interval (2.502, 2.73). The Predict feature of East can also handle situations where the sites are initially closed, but would open later and start accruing the subjects subsequently. We will illustrate this feature now. Import the ORLISTAT EnrollmentOnly SubjectData.csv and ORLISTAT EnrOnly SiteData.csv which will create data nodes in the library. As before, choose the menu item Analysis>Predict>Enrollment Only. 1694 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the resulting dialog box, give the inputs as shown below: 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1695 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Click View Site Dataset Note that the Site Ready Time for sites 4 and 10 are missing. Note that the two sites can be opened anytime during the time interval (2, 9). Accordingly, the SIP Start and SIP End values are 2 and 9 respectively. Click Hide Dataset. Click Next. This will invoke the Accrual/Dropouts Information dialog. The Current Sample Size is 183 as the subjects at the sites 4 and 10 have not yet been enrolled. East gives two options for generating arrivals: Poisson or Uniform. Select the option of Poisson arrivals. Scroll down a little for the middle table to view the Unopened Sites information. The values of Site Initiation Period Start and End are NA as these sites are already open whereas for the sites 4 and 10, Start and periods are specified which will be used to generate the Site Initiation Times for these two sites. The column Accrual Rate/Site depicts the accrual rates calculated from the existing data. These will be used to simulate the remaining 274 − 183 = 91 accruals. You may change the values of Accrual rate/Site. Suppose henceforth the sites 13 and 17 are expected to enroll the subjects pretty fast. We want to change the accrual rates for the sites 13 and 17 to 20 and 40 respectively. Change the corresponding values. The Planned Accrual Rate are the values read from the data in the variable SiteAcrrRate. These values can’t be edited. Now the input screen showing the lower part of the table scrolled down would look as shown below: 1696 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Go to the Simulation Controls tab. You can change the simulation parameters here as per your wish. Suppose you want to have the outputs stored in .csv format. Select the Output Type as CSV file. Either you can already create the files in local folders and select them using the Browse button or you may create the files while you are browsing. Suppose the Summary.csv, SubjectData.csv, SitewiseSummry.csv and SitewisePara.csv are the files which would store the Summary statistics for every simulation run, Subject level data for 1 simulation run, Sitewise summary for every simulation run and Sitewise parameter data for 1 simulation run respectively. All these files are to be stored on say the local drive G. Choose the Random Number Seed as Fixed with its value 12345. The input screen for simulation will be seen as 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1697 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis shown below: Click Simulate. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim1. This time no sub nodes will be created as we have asked the output to be saved in .CSV files at the specified locations on the machine. To view the detailed output, double click the node 1698 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 PredictSim1 in Library. You will see the following output. The Average Accrual Duration across all the simulations is 2.866. The Accrual Duration column in the Overall Summary table indicates the frequency distribution of Accrual Duration. Accordingly, median accrual duration is 2.864, whereas 75% of the simulations have total Accrual Duration 2.91. To view the Enrollments Simulation Plot click the PredictSim1 node in 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1699 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis the Library; use the tool in the Library pane. The Enrollment Prediction Plot displays the timeline of the observed accruals until 2.23 by which in all 183 subjects have been enrolled. This is as per the observed data. After that point it displays the projected enrollments based on the observed accrual data we specified and the revised Accrual Rate in the second period. For example, at year 2.4 the predicted median enrollments reach the sample size of 208 subjects with the 95% Confidence Interval as (198.077, 218.102). Please see the plot below. 1700 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let us investigate further the output of the simulations run. Below is repeated the Overall Output from the detail output. If you have a close look at the column No of Sites Opened, you will notice that till the 75th percentile of the Accrual duration, that is by 2.91, only 16 sites were open as was the situation in the beginning. One more site has got opened during 2.91 and 2.979. Let us see what all has happened during this time period. We have saved the outputs in .CSV files. Open the file Summary.csv which stores summary statistics for every simulation run. You will see the data as shown below: 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1701 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Observe that for every simulation East calls the Current Sample Size available data as Interim while the Target Sample Size as Final. The column SUCCESS indicates that the simulation was successful. The variable TotEvents is synonymous to Sample Size. The last column AccrDurtn specifies the accrual duration required to enroll the 274 subjects in the respective simulation run. For instance, in the first simulation, the 274th subject arrived at the time epoch 2.80408 and so on. Now open the Subject.csv file which stores arrival times of each subject for one simulation. You will see the 1702 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following display of data. Note that the data are ordered according to the arrival times of all the subjects across all the sites. The first three subjects arrive at Site 1 in succession, whereas the forth subject arrives at Site 5 and so on. You can sort the data on sites and see that the sites 4 and 10 are not opened in this particular simulation, that is simulation 4. The last subject arrived in Simulation 1 at time point 2.8603 at Site 12. The question is when the sites 4 and 10 finally opened and started accepting subjects. Since these sites don’t occur in the Summary data, they must have got opened after the last arrival. To verify this, open the file SitewisePara.csv which stores the site parameter data for one simulation run. 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information 1703 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis The data are as follows: This file gives site wise details of the accrual process. The columns SiteInitiationTime and SubjectsAccrued specify the time at which the site was opened and the number of subjects it accrued in that simulation. The column LastSubjectRand shows the time at which the last subject arrived at that site. Since the accrual rates of different sites are different, some sites would accrue more subjects than the ones having low accrual rates. Also the SiteInitiation time matters for accruing a few or more subjects. Note that from the last two rows, the SiteInitiationTime for Site 4 is 3.5685 and for Site 10, it is 7.9575. However, the last subject arrived in the study at 2.8603 at Site 12. As a result, both the sites 4 and 10 got opened after the accruals in the study on a whole were complete. The columns Accrual Duration and ObsrvdAccrualRate specify the site wise accrual duration and the rate at which the site accrued subjects. Now open the 1704 68.1 Enrollment Only – 68.1.2 Subject Data with Site Information <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 SitewiseSummary.csv file. It will display the following information. Averages across all the simulations for the quantities, Initiation Time, last Subject Arrival Time, Number of Subjects Enrolled, Accrual Duration, Accrual Rate are provided for individual sites. From the last two rows, it should be noted that only in 97 simulations out of 1000, the two sites 4 and 10 have got opened during the accrual duration. We suggest you to try with different input subject data and/ or site data with varying accrual rates, site initiation intervals, enrollment cap to develop better insight of the accrual process. 68.1.3 Enrollment Only: Summary Data In the earlier section we saw how to simulate the accruals and estimate the average accrual duration when an interim subject-level data is available. However, sometimes, the subject-level data may not be available. What can be available is the summary of the accruals that have happened till date. For example, in the case of Orlistat trial considered above, the DMC statistician may have the information that there have been 212 subjects accrued so far and the last subject arrived at time 2.224. The DMC statistician is interested in knowing the total accrual duration for say 318 accruals. East through its Predict feature makes it possible to still come up with an estimate of average accrual duration based on arrival simulations for these additional 318 − 212 = 106 arrivals. To see this, choose the menu item 68.1 Enrollment Only – 68.1.3 Summary Data 1705 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Analysis>Predict>Enrollment Only. In the ensuing dialog box, select the Input option, Summary Data. Fill in the Sample Size as 212 and the Current Calendar Time as 2.224. The screen will look as shown below. 1706 68.1 Enrollment Only – 68.1.3 Summary Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Next. The next input dialog appears as shown here: As before, default values for Current Sample Size and the Target Sample Size are 212 and 318 respectively, default for the target Sample Size being 1.5 ∗ CurrentSampleSize. You may change the Target Sample Size value. This is the value of targeted enrollment in the trial. The objective is to find out on an average how long will the trial take to enroll these many subjects. The Current Calendar Time is accrual time of the last subject in the data which is 2.224. The Accrual Information input is meant for simulating the additional, that is 318 − 212 = 106 accruals. There are two options for Input Method. For the Accrual Rates option, East considers the accrual process comprised of two periods. The first period is assumed to be the one presented in the data. Starting time for this period is assumed to be 0 whereas the accrual rate for this period is computed as (CurrentSampleSize)/(CurrentCalendarT ime). In this case it is 212/2.2243 = 95.31. Both the starting time and Accrual Rate fields are uneditable as these are estimated from the data. The second period is the one which starts after the last subject in the first period has arrived. As a result, the Starting At time default value is the last arrival time in the data which is also the Current Calendar Time. For the second period, the default accrual rate is computed as: (T argetSampleSize − CurrentSampleSize)/(CurrentCalendarT ime) which is (318 − 212)/2.2243 = 47.65499 for the current example. You can edit both the Starting Time and Accrual Rate for the second period. Accrual may vary over time. To reflect this assumption, one can specify the number of time periods, each having different accrual rates. Go to the Simulation Controls tab. Let us fix the Random Number Seed to 12345. Check the Output options, Save the summary statistics for every simulation run and Save subject-level 68.1 Enrollment Only – 68.1.3 Summary Data 1707 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis data for 1 simulation run. The input screen will be as shown below: Click the button Simulate. East simulates the arrival of subjects according to the Uniform Arrival process. After a few seconds East will display the message that Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim1 with sub-nodes for SummaryStat and SubjectData in the Library. To view the detailed summary output of the simulations, double click the node PredictSim1 in the Library. The following output is displayed. 1708 68.1 Enrollment Only – 68.1.3 Summary Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The table at the left describes the Simulation scenario. This summary contains an 68.1 Enrollment Only – 68.1.3 Summary Data 1709 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis overview of the actual data we observed and the simulated results for the remaining arrivals in the second period. The table Overall Output presents the information on the percentiles of the (simulated) total accrual duration. For example, almost 50% simulations have completed by 4.433 units of time etc. The mean accrual duration of all the simulations is 4.427. Open the SummaryStat data by double clicking the sub node. You will see the following display of data. Observe that for every simulation East calls the Current Sample Size available data as First Interim while the Target Sample Size as Final. The column SUCCESS indicates that the simulation was successful. The variable TotEvents is synonymous to Sample Size. The last column AccrDurtn specifies the accrual duration required to enroll the 318 subjects in the respective simulation run. For instance, in the first simulation, the 318th subject arrived at the time epoch 4.34938 and so on. Now double click the SubjectData sub node in the library. You will see the following display of data. 1710 68.1 Enrollment Only – 68.1.3 Summary Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 It shows arrival times of each and every subject in the study for one simulation. This is because we have asked to save the data for one run. To view the detailed summary output of the simulations, Open the SummaryStat data by double clicking the sub 68.1 Enrollment Only – 68.1.3 Summary Data 1711 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis node. You will see the following display of data. Observe that for every simulation East calls the Current Sample Size available data as First Interim while the Target Sample Size as Final. The column SUCCESS indicates that the simulation was successful. The variable TotEvents is synonymous to Sample Size. The last column AccrDurtn specifies the accrual duration required to enroll the 318 subjects in the respective simulation run. For instance, in the first simulation, the 318th subject arrived at the time epoch 4.35001 and so on. Now double click the SubjectData sub node in the library. You will see the following display of data. 1712 68.1 Enrollment Only – 68.1.3 Summary Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 It shows arrival times for the subjects in the study for one simulation run. Note that the first subject id is 213 as the summary data input was for 212 subjects. If you scroll down, you can see that the last subject id is 318 which is the target sample size. This 68.1 Enrollment Only – 68.1.3 Summary Data 1713 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis subject was accrued at 4.34938. In addition, you can view the enrollment prediction plot using the Library pane: tool in the The Enrollment Prediction Plot displays the timeline of the observed accruals until 2.234 by which in all 212 subjects have been enrolled. This is as per the observed data. After that point it displays the projected enrollments based on the observed accrual data we specified and the revised Accrual Rate in the second period. For example, at year 3.2 the predicted median enrollments reach the sample size of 258 subjects with the 95% Confidence Interval as (246.372, 271.372). Please see the plot below. 1714 68.1 Enrollment Only – 68.1.3 Summary Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 On similar lines, if you want to find out how long will it take to enroll say, 275 subjects. Select the Input > Enrollments option on the plot and type 275 in the Enrollments textbox as shown in the following plot. From the read-offs it is clear that the median accrual duration for accruing 275 subjects would be 3.551 with 95% confidence interval (3.249, 3.867). 68.1 Enrollment Only – 68.2.3 Summary Data 1715 <<< Contents 68 68.2 * Index >>> Enrollment/Events Prediction - Analysis Events and EnrollmentUnblinded Data 68.2.1 Accrual Complete 68.2.2 Accrual Ongoing A DMC statistician typically has access to unblinded trial data. In survival studies, the prediction of enrollments as well as of events is of interest. As it is important to know how long the accruals will take to complete, it is equally important to know how much time it will take to get the required number of events. With the use of predict feature, providing the inputs such as accrual rate, hazard rates, drop-out rates for the treatment and control arms you can forecast how long the subject enrolment is likely to take, and how long the trial is likely to take to complete. With PREDICT, one is able to simulate the accrual process and the follow up time, so as to predict the average accrual duration, average follow up time and average study duration (by predicting when the required number of events are likely to be achieved). We will treat the cases unblinded and blinded data separately. In unblinded situation, the user is expected to know the subject data or summary data for both the control and treatment arms separately. For instance, the control and treatment have different hazard and drop out rates and this information can be provided to East by giving different inputs for the two arms. In the case of blinded, the user is supposed to know the common hazard rate which is utilized to generate the events for control and treatment both. We will illustrate the feature with the help of Oncox (for unblinded) and Rales (for blinded) trials explained in chapter 44. 68.2.1 Events and Enrollment- Unblinded Data: Accrual Complete Subject-level Data Assume that the study has already accrued all the subjects and we are interested in forecasting only the follow up time and study duration. The trial has accrued in all 402 subjects and stopped accruing anymore. The Subject data are available in the file ONCOX iLook1 SubjectData.csv in the Samples folder of East installation directory. The data file can be imported into East using the Import button in the Home ribbon: Once imported, the file will appear in the Library pane as nodes in the active workbook, with extension .cydx. Choose the menu item 1716 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis>Predict>Events and Enrollment-Unblinded Data. In the ensuing dialog box, select the Accrual option, Complete. Select data set ONCOX iLook1 SubjectData.cydx . Map the variables from the data to the ones shown below: 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1717 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Click Next. The next input dialog appears as shown here: The default values for Hazard Rate - Control and Hazard Rate -Treatment are estimated from the subject data. You can verify this by running the LogRank Test from Analysis> Events >Two Samples >LogRank[SU-2S-LR]. The input dialog for the same would be Choose the variables as shown in the dialog. Click OK. A partial output is shown 1718 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below: Observe that the Hazard Ratio is 0.743 and the Estimated Hazard Rates table gives the Hazard Rate for Control 0.10847 and that for treatment 0.08062. These are the same as the ones East chose while predicting the events. Please refer to the second input dialog for Predict. Continuing the Predict for the Oncox subject data, go to the Accrual/ DropOuts tab. In the ensuing dialog, you will see 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1719 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis almost all values filled in. The Current Calendar Time is accrual time of the last subject in the data which is 20.724. The drop out hazard rates for Control and Treatment are estimated from the subject data. The Target Sample Size is disabled as we have chosen the option Accruals complete. However, the Target Number of Events default value is 402, the same as in the data. This value can be edited. You can edit the values of hazard rates, target number of events etc. You can choose a specific follow up period as well by selecting For Fixed Period in the Subjects are followed textbox. The number of hazard pieces in the Drop out information also can be increased to specify different hazard rates for different time periods. The Number of pieces equal to 0 will assume that there aren’t going to be any drop outs. For now, let us proceed further with all the default values. Go to Simulation Controls 1720 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 tab. Check the Output Options for saving the outputs. Click Simulate. East simulates the arrival of subjects according to Poisson Arrival process with inter-arrival times following exponential distribution. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim1 with sub-nodes for SummaryStat and SubjectData in the Library. To view the detailed summary output of the simulations, double click the node 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1721 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis PredictSim1 in the Library. The following output is displayed. The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. Since the accruals were complete, the Target Sample Size and the Target Number of Events are the same and equal to 402. The table Actuals from Interim Trial Data: Sample Size and Events presents the detailed information of the subject data such as events on control and treatment arms, drop outs, average follow up etc. Observe that at the end of the current time, the subjects in pipeline are 212. These are followed till the end of the study. The study is complete when all the subjects in pipeline either experience events or drop out. The table Average Sample Size and Events provide information about the average study duration, average number of events on control and treatment, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 86.765 units of time to complete the study. The number of events on control arm and treatment arm would be around 201 and 194. Average follow up time for an individual is 10.609. The Overall Output table describes the details of the distribution of Average Study Duration across 1000 simulations. Note that the column No of Accruals has all values 402 since the accruals were complete and only events are being forecasted. Since there are a few drop outs, we expect lesser, say around 395 events to occur out of 402 subjects. It is worth noting that the 5th percentile of the Average Study Duration is going to give 392 events pretty early, by the time 67.981. The 95th percentile is 111.235 which is the maximum duration the study can take. You could have changed the percentiles input, if you want to be more specific. For instance 1722 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 you can input 100% to get the value of maximum study duration. Let us have a look at the individual files stored in the Library. Open the SummaryStat data by double clicking the sub node. You will see the following display of data (shown in parts). Accruals0 and Accruals1 specify the total subjects accrued on Control and Treatment arms respectively. Similar convention is used for naming the various 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1723 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis quantities for Control and Treatment. As before, Interim refers to the available data whereas Final refers to the simulated data. Observe that although the TotAccruals for every simulation is 402, the TotEvents may differ from simulation to simulation. Total number of events occurred in simulations 1, 2 and 3 are 398, 395 and 396 respectively. This is because of varying number of drop outs which are 4, 7 and 6 respectively for these three runs. The column AvgFollowUp indicates the average follow-up time of subjects by this stage, interim or Final. It is worth noting that the LookTime corresponding to the Final stage is essentially the study duration observed in that particular simulation. In other words, all the 402 subjects were accrued and followed in a period of 118.44 time units in simulation 1, while 78.207 for simulation 2 and so on. Open the Subject Data file which stores detailed information about one simulation. TreatmentID equal to 0 means the subject is on Control while 1 indicates Treatment. Arrival Time is the calendar time, Survival Time is the duration for which the subject was alive in the study. DropOutTime is the duration of time the subject was present in the study before dropping out. These are generated using the specified drop out hazard rates for control and treatment. For the first subject in the data, the DropOutTime is 237.9819 which is greater than the survival time. The time on study is the time subject was present in the study, which is Accrual Time plus Survival Time or Accrual Time plus Drop Out Time whichever is 1724 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 minimum. In this case it is, 28.645. This means that the subject will not drop out till the study is completed. Accordingly, the value of CensorInd 1 is 1 as it results in a complete observation for survival. On the other hand, observe that for the SubjectID 9, the DropOutTime is 14.7039, and survival time is not displayed. It means that the generated survival time was more than the drop out time. As a result, the drop out will happen before the event. The subject drops out resulting into a censored observation(CensorInd 1 =0). tool in the Library pane) The Events Prediction Plot (invoked using the shows that the median number of events 395 are reached in a duration of 90.599 units of time. The earliest time to reach this target may be by 59.264 by looking at the upper 95% confidence limit at this time. The screens shot shown below illustrate this result. 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1725 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis In order to find the average study duration for getting the median number of events, select the Input option Events in the plot. Enter 395 for Events. The median study duration is 113.31. Since the 95% upper limit does not exist, one can not forecast the latest time to reach the target. Invoke the (Dropout Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 90.87, the median number of dropouts in both the control and treatment arms would be 6 in a 95% confidence interval of 3 to 11. If you select the Show Predicted Avg. Dropouts the predicted dropouts will be added to the plot. 1726 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 To find out the duration by which there will be specified number of drop outs, select the Input option Dropouts. Suppose we are interested in knowing by what time there will be 4 drop outs, give this input for Events. Click Enter. Note that the predicted median time for getting 4 dropouts is 26.762. Summary Data In the earlier section we saw how to generate the events and follow all subjects till they experience either events or drop out. We estimated the study duration with the help of Predict feature in East. For this to use, we assumed that an interim subject-level data was available which had information on individual arrival time, status etc. However, many a times, the subject-level data may not be available. What can be available is the summary of the accruals that have happened till date. For 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1727 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis example, in the case of Oncox trial considered above, the DMC statistician may have the information that there have been 402 subjects accrued so far, 203 on Control and 199 on Treatment. The number of events occurred so far on Control and Treatment are 106 and 81 respectively. The subjects dropped out are 1 on Control and 2 on Treatment arm. The last subject arrived at time 20.7236. The DMC statistician is interested in knowing the total study duration when all the accrued 402 subjects are followed till end. East through its Predict feature makes it possible to still come up with an estimate of average study duration based on simulating events from Poisson process based on the specified or default hazard rates. To see this, choose the menu item Analysis>Predict>Events and Enrollment-Unblinded Data. Select the Input Summary Data and Accruals Complete. Enter the above mentioned inputs for the quantities required. 1728 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Next. The next input dialog appears as shown here: The default values for Hazard Rate - Control and Hazard Rate -Treatment are shown in the dialog. Note that these are not estimated from any data as we don’t have the individual subject data as input. Nonetheless, use the same hazard rates 0.10847 for Control and 0.08062 for treatment, input as in the previous section. Can we consider Target No of events as 500? If you give this input, you will get an error Value range for target no. of events should be [188,402]. This is because the accruals are complete and it won’t accept any further accruals. The 187 events are already occurred. Suppose you are interested in 385 events. Input this value for Target No. of Events Go to the Accruals/DropOuts tab. Suppose instead of drop out hazard rates, the information is available on the probabilities of drop out. Suppose the probability of drop out for a subject receiving Control is 0.5% and Treatment is 0.6% and these are 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1729 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis applicable from the current calendar time onwards which is 20.724. Give all these inputs. Go to Simulation Controls tab. Give a fixed seed 12345. Save the outputs for Summary and Subject data. Click on Simulate. East simulates the events according to Poisson Arrival process with inter-arrival times following exponential distribution. The parameters are derived from the specified hazard rates for Control and Treatment. For details refer to the Appendix M. 1730 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim2 with sub-nodes for SummaryStat and SubjectData in the Library. To view the detailed summary output of the simulations, double click the node PredictSim2 in the Library. The following output is displayed. The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. Since the accruals were complete, the Target Sample Size and the Target Number of Events are 402 and 385 respectively. The table Actuals from Interim Trial Data: Sample 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1731 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Size and Events presents the detailed information of the subject data such as events on control and treatment arms, drop outs, average follow up etc. Observe that at the end of the current time, the subjects in pipeline are 212. These are followed till the end of the study. The study is complete when in all 385 events occur. A few simulations may give lesser number of events as there can be more drop outs. The table Average Sample Size and Events provide information about the average study duration, average number of events on control and treatment, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 50.946 units of time to complete the study. The number of events on control arm and treatment arm would be around 197 and 187. The Overall Output table describes the details of the distribution of Average Study Duration across 1000 simulations. Note that the column No of Accruals has all values 402 since the accruals were complete and only events are being forecasted. Since the targeted number of events was 385, the No of Events column show the value 385 in almost all the cases. It is worth noting that the 5th percentile of the Average Study Duration is going to give 381 events pretty early, by the time 52.755. The 95th percentile is 98.407 by which almost in all cases the target would be achieved. Let us have a look at the individual files stored in the Library. Open the SummaryStat data by double clicking the sub node. You will see the 1732 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following display of data (shown in parts). Accruals0 and Accruals1 specify the total subjects accrued on Control and Treatment arms respectively. Similar convention is used for naming the various quantities for Control and Treatment. As before, Interim refers to the available data whereas Final refers to the simulated data. Observe that although the TotAccruals for every simulation is 402, and the TotEvents is 385. The TotPending values for Final Look are the subjects which have neither experienced events nor have dropped out till the end of the study. This is because the study is concluded after getting 385 events and does not proceed till all the subjects experience the event as was the case in the previous section. It is worth noting that the LookTime 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1733 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis corresponding to the Final stage is essentially the study duration observed in that particular simulation. Open the Subject Data file which stores detailed information about one simulation. The SubjectID starts from 191 as there were 187 events occurred earlier and 3 had dropped out. For the initial 190 subjects the detail information such as arrival time, drop out time etc is not available. TreatmentID 0 means the subject is on Control while 1 indicates Treatment. For the 191 subject onwards, the survival times and drop out times are generated. Survival Time is the duration for which the subject was alive in the study. DropOutTime is the duration of time the subject was present in the study before dropping out. These are generated using the specified drop out probabilities for control and treatment. Note that the data is sorted on Survival Times. Key points to observe: Since out of targeted 385 events, 187 were observed earlier, the required number of events is essentially 198. Subject 289 drops out as its generated survival time is greater than its drop out time. The subjects having SubjectID 401 and 402 are not followed as the requirement of 385 events has been satisfied. They just form the group of pending subjects which are 4 in number. For the subjects which are either dropped out or form a pending observation, the 1734 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 value of CensorInd 1 is 0. The Events Prediction Plot (invoked using the tool in the Library pane) shows that the median number of events 385 are reached in a duration of 68.129 units of time. Invoke the (Dropout Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 80.751, the median 68.2 Events and Enrollment- Unblinded Data – 68.2.1 Accrual Complete 1735 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis number of dropouts in both the control and treatment arms would be 15 in a 95% confidence interval of 8 to 22. If you select the Show Predicted Avg. Dropouts the predicted dropouts will be added to the plot. 68.2.2 Events and Enrollment- Unblinded Data: Accrual Ongoing Subject-level Data The ONCOX trial in the earlier chapters was designed with a sample size of 460 subjects with an expected accrual period of around 24 months and targeted 374 events within a study period of around 30 months. Assume that an interim look has been taken and subject data are available at this time point. The trial is still accruing subjects and we are interested in forecasting Accrual duration as well as the Study Duration. The trial has accrued in all 402 subjects so far. The Subject data are available in the file ONCOX iLook1 SubjectData.csv in the Samples folder of East installation directory. The file ONCOX iLook1 SubjectData contains a list of subjects accrued so far and the following data for each subject: Country: Country ID. SiteID: the site at which the subject arrived. ArrivalTime: the time at which the subject arrived. TreatmentID: a variable indicating which group the subject was randomized to (‘1’ for treatment, ‘0’ for placebo). TimeOnStudy: the length of time the subject has been in the study, corresponding to survival time. Status: a variable indicating whether the subject is a completer (‘1’), a dropout 1736 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (‘-1’) or in the pipeline (‘0’). Censor: a variable indicating whether the subject is a completer (‘1’) or a non-completer (‘0’). A non-completer can be either a dropout or in the pipeline. The trial is assumed to enroll subjects from several sites, the information about which is provided in the file ONCOX iLook1 SiteData also available in Samples folder. The file contains the following data for each site: Country: Country ID. SiteID: the identification number of the site. SiteReadyTime: the time at which the site was initiated. SiteAccrRate: the site accrual rate specified in the enrollment plan. SubjectsAccrued: the number of subjects accrued at the site. LastSubjectRand: the randomization time of the last subject arriving at the site. ObsrvdAccrualRate: the observed accrual rate at the site. PosteriorAccrualRate: the updated site accrual rate. SIP Start: the start of the initiation period of the site. SIP End: the end of the initiation period of the site. Ecap: the enrollment cap, representing the maximum number of subjects that can be enrolled at the site. Both these files can be imported into East using the Import button in the Home ribbon: Once imported, the files will appear in the Library pane as nodes in the active workbook, with extension .cydx. Choose the menu item Analysis>Predict>Events and Enrollment-Unblinded Data. 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1737 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis In the ensuing dialog box, select the Accrual option, Ongoing. Select data set ONCOX iLook1 SubjectData.cydx . Tick the check box Include Site-specific Information. Map the variables from the data to the ones shown below: 1738 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Next. The next input dialog appears as shown here: The default values for Hazard Rate - Control and Hazard Rate -Treatment are estimated from the subject data. You can verify this by running the LogRank Test from Analysis> Events >Two Samples >LogRank[SU-2S-LR]. The input dialog for the same would be Choose the variables as shown in the dialog. Click OK. A partial output is shown 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1739 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis below: Observe that the Hazard Ratio is 0.743 and the Estimated Hazard Rates table gives the Hazard Rate for Control 0.10847 and that for treatment 0.08062. These are the same as the ones East chose while predicting the events. Please refer to the second input dialog for Predict. Continuing the Predict for the Oncox subject data, go to the Accrual/ Dropouts tab. In the ensuing dialog, you will see almost all values filled in. Change the Accrual Model to Poisson. 1740 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Current Calendar Time is accrual time of the last subject in the data which is 20.724. The drop out hazard rates for Control and Treatment are estimated from the subject data. The Target Sample Size is 603 which is 1.5 ∗ SampleSize. You are free to change the Target Sample Sizeas we are assuming that the study is still accepting enrolments. The Target Number of Events default value is 402, the same as in the data. This value can be edited. You can edit the values of hazard rates, target number of events etc. You can choose a specific follow up period as well by selecting For Fixed Period in the Subjects are followed textbox. The number of hazard pieces in the Drop out information also can be increased to specify different hazard rates for different time periods. The Number of pieces equal to 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1741 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis 0 will assume that there aren’t going to be any drop outs. For now, let us proceed further with all the default values. Go to Simulation Controls tab. Give a fixed seed 12345. Check the Output Options for saving the outputs. Click Simulate. East simulates the arrival of subjects according to Poisson Arrival process with inter-arrival times following exponential distribution. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim1 with sub-nodes for SummaryStat, SubjectData, SiteSummary and SitePara in the Library. To view the detailed summary output of the simulations, double click the node PredictSim1 in the Library. The following output is displayed. 1742 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. The table Actuals from Interim Trial Data: Sample Size and Events presents the detailed information of the subject data such as events on control and treatment arms, drop outs, average follow up etc. The table Average Sample Size and Events provide information about the average study duration, average number of events on control and treatment, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 30 units of time to complete the study. The number of events on control arm and treatment arm would be around 219 and 183. Average follow up time for an individual is 7.102. The Average Accrual Duration is 26. The Overall Output table describes the details of the distribution of Accrual Duration and Study Duration across 1000 simulations. It is worth noting that the 5th percentile of the Average Study Duration is going to give 402 events, by the time 29.109. The 95th percentile is 30.931 which is the maximum duration the study can take. You could have changed the percentiles input, if you want to be more specific. For instance you can input 100% to get the value of maximum study duration. Let us have a look at the individual files stored in the Library. Open the SummaryStat data by double clicking the sub node. You 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1743 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis will see the following display of data (shown in parts). Accruals0 and Accruals1 specify the total subjects accrued on Control and Treatment arms respectively. Similar convention is used for naming the various quantities for Control and Treatment. As before, Interim refers to the available data whereas Final refers to the simulated data. Observe that although the TotAccruals for every simulation is 603, and the TotEvents is 402. The TotPending values for Final Look are the subjects which have neither experienced events nor have dropped out till the end of the study. This is because the study is concluded after getting 402 events and does not proceed till all the subjects experience the event. Note that the LookTime corresponding to the Final stage is essentially the study duration observed in that particular simulation. Open the Subject Data file which stores detailed information about one 1744 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulation. TreatmentID 0 means the subject is on Control while 1 indicates Treatment. For all the subjects, the survival times and drop out times are generated. Survival Time is the duration for which the subject was alive in the study. DropOutTime is the duration of time the subject was present in the study before dropping out. For the existing data, the Drop out times are generated. These are generated using the specified drop out probabilities for control and treatment. For the new arrivals, accrual times, Survival time as well as drop out time are generated. Open the SiteSummary file which stores detailed information about one simulation. 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1745 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis The file contains averages across all simulations for each site of the quantities such as initiation times, accrual duration, number of subjects enrolled, accrual rate, number of sites opened etc. Open the SiteData file which stores detailed information about one simulation. Click on the PredictSim node. The Enrollments Prediction Plot (invoked using the tool in the Library pane) shows that the median number of enrollments 603 are reached in a duration of 26.055 units of time. 1746 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Invoke the (Events Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 30.076, the median number of events in both the control and treatment arms would be 402 in a 95% confidence interval of 382.75 to 422. Invoke the (Dropouts Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 29.48, the median number of drop outs in both the control and treatment arms would be 6 in a 95% confidence interval of 4 to 11. 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1747 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis If you select the Show Predicted Avg. will be added to the plot. Dropouts the predicted dropouts Summary Data In the earlier section we saw how to generate the enrollment and follow them till the required number of events occur or the target sample size is reached. We estimated the study duration with the help of Predict feature in East. For this to use, we assumed that an interim subject-level data was available which had information on individual arrival time, status etc. However, many a times, the subject-level data may not be available. What can be available is the summary of the accruals that have happened till date. For example, in the case of Oncox trial considered above, the DMC statistician may have the information that there have been 402 subjects accrued so far, 203 on Control and 199 on Treatment. The number of events occurred so far on Control and Treatment are 106 and 81 respectively. The subjects dropped out are 1 on Control and 2 on Treatment arm. The last subject arrived at time 20.7236. The DMC statistician is interested in knowing the total study duration when all the accrued 402 subjects are followed till end. East through its Predict feature makes it possible to still come up with an estimate of average study duration based on simulating events from Poisson process based on the specified or default hazard rates. Accruals are simulated till the target sample size is reached or the target number of events are observed. To see this, choose the menu item 1748 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis>Predict>Events and Enrollment-Unblinded Data. Select the Input Summary Data and Accruals Ongoing. Enter the above mentioned inputs for the quantities required. 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1749 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Click Next. The next input dialog appears as shown here: The default values for Hazard Rate - Control and Hazard Rate -Treatment are shown in the dialog. Note that these are not estimated from any data as we don’t have the individual subject data as input. Let us use the same hazard rates input as in the previous section. Hazard Rate for Control 0.10847 and that for treatment 0.08062 are these values. The default Target No of events is 402. The 188 events are already occurred. This means that the accrual will continue till we get 402 events in all. After filling all these values, the input dialog looks as shown below: Go to the Accruals/DropOuts tab. Suppose instead of drop out hazard rates, the information is available on the probabilities of drop out. Suppose the probability of drop out for a subject receiving Control is 0.5% and Treatment is 0.6% and these are applicable from the current calendar time onwards which is 20. 724. Give all these 1750 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 inputs. Go to Simulation Controls tab. Give a fixed seed 12345. Save the outputs for Summary and Subject data. Click on Simulate. East simulates the events according to Poisson Arrival process with inter-arrival times following exponential distribution. The parameters are derived from the specified hazard rates for Control and Treatment. For details refer to the 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1751 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Appendix M. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim3 with sub-nodes for SummaryStat and SubjectData in the Library. To view the detailed summary output of the simulations, double click the node PredictSim3 in the Library. The following output is displayed. 1752 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. The table Actuals from Interim Trial Data: Sample Size and Events presents the summary data input. The study is complete when in all 402 events occur. The table Average Sample Size and Events provide information about the average study duration, average number of events on control and treatment, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 35.31 units of time to complete the study. The number of events on control arm and treatment arm would be around 215 and 187. The Overall Output table describes the details of the distribution of Average Study Duration across 1000 simulations. Note that the column No of Events has all values 402 (except for actuals) meaning thereby the target sample size of 603 was adequate in giving the required number of events. The values in the column Number of Accruals vary since any simulated study concludes as soon as the target number of events are achieved. It is worth noting that the 5th percentile of the Average Study Duration is going to give 402 events pretty early, by the time 33.872. The 95th percentile is 36.815 by which almost in all cases the target would be achieved. The Enrollments Prediction Plot (invoked using the tool in the Library pane) shows that the median number of accruals 603 are reached in a duration of 41.469 units of time. The Events Prediction Plot (invoked using the tool in the Library pane) shows that the median number of events 402 are reached in a duration of 35.333 units 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing 1753 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis of time. Invoke the (Dropout Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 35.333, the median number of dropouts in both the control and treatment arms would be 16 in a 95% confidence interval of 9 to 24. If you select the Show Predicted Avg. Dropouts the predicted dropouts will be added to the plot. 1754 68.2 Events and Enrollment- Unblinded Data – 68.2.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 68.3 Events and Enrollment- Blinded Data 68.3.1 Events and Enrollment-Blinded Data: Accrual Complete In the case of blinded data, information on the individual responses on the treatment and control arms is not available. Instead, common hazard rate and common dropout rate are available. We will explain the Predict feature for blinded data with the help of RALES Trial for Time to Event end point. Subject-level Data The simulation of the RALES trial in Chapter 66 indicated a required sample size of 1638 subjects with an expected accrual period of around 20 months. The total duration of the study was around 72 months. Assume that an interim look has been taken and subject data are available at this time point. The trial is still accruing subjects and we are interested in forecasting Accrual Duration as well as the Study Duration. The trial has accrued in all 1205 subjects so far. The Subject data are available in the file RALES iLook1 SubjectData.csv in the Samples folder of East installation directory. The file RALES iLook1 SubjectData.csv contains a list of subjects accrued so far and the following data for each subject: SiteID: the site at which the subject arrived. ArrivalTime: the time at which the subject arrived. TreatmentID: a variable indicating which group the subject was randomized to (‘1’ for treatment, ‘0’ for placebo). TimeOnStudy: the length of time the subject has been in the study, corresponding to survival time. 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete 1755 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis CensorIndicator: a variable indicating whether the subject is a completer (‘1’), a dropout (‘-1’) or in the pipeline (‘0’). CensorInd: a variable indicating whether the subject is a completer (‘1’) or a non-completer (‘0’). A non-completer can be either a dropout or in the pipeline. A portion of the RALES iLook1 SubjectData.csv file is shown below: The subject data file can be imported into East using the Import button in the Home ribbon: Once imported, the file will appear in the Library pane as nodes in the active workbook, with extension .cydx. Choose the menu item Analysis>Predict>Events and Enrollment-Blinded Data. In the ensuing dialog box, select the Accrual option, Complete. Select data set RALES iLook1 SubjectData.cydx . Map the variables from the data to the 1756 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ones shown below: Click Next. The next input dialog appears as shown here: The default value for Hazard Rate is estimated from the subject data. Since the accruals are complete, the Target Sample Size is uneditable. However, you can edit the Target No. of Events. Let us continue with the default value which is equal to the total subjects accrued so far. Go to the Accrual/Dropouts tab. In the ensuing dialog, you will see almost all values filled in. 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete 1757 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis The Current Calendar Time is accrual time of the last subject in the data which is 15.561. You can edit the values of hazard rate, targeted number of events etc. You can choose a specific follow up period as well by selecting For Fixed Period in the Subjects are followed textbox. The number of hazard pieces in the Drop out information also can be increased to specify different hazard rates for different time periods. The Number of pieces equal to 0 will assume that there aren’t going to be any drop outs. For now, let us proceed further with all the default values. Go to Simulation Controls tab. Check the Output Options for saving the outputs. 1758 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Simulate. East simulates the arrival of subjects according to Poisson Arrival process with inter-arrival times following exponential distribution. After a few seconds East will display the message ’Simulations complete. Waiting for User’s action’. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim1 with sub-nodes for SummaryStat and SubjectData in the Library. To view the detailed summary output of the simulations, double click the node PredictSim1 in the Library. The following output is displayed. 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete 1759 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. Since the accruals were complete, the Target Sample Size and the Target Number of Events are the same and equal to 1205. The table Actuals from Interim Trial Data: Sample Size and Events presents the detailed information of the subject data such as events, drop outs, average follow up etc. Observe that at the end of the current time, the subjects in pipeline are 982. These are followed till the end of the study. The study is complete when all the subjects in pipeline either experience events or drop out. The table Average Sample Size and Events provide information about the average study duration, average number of events, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 273.185 units of time to complete the study. Average follow up time for an individual is 34.443. The Overall Output table describes the details of the distribution of Average Study Duration across 1000 simulations. Note that the column No of Accruals has all values 1205 since the accruals were complete and only events are being forecasted. Since there are a few drop outs, we expect on an average a lesser, say around 1113 events to occur out of 1205 subjects. It is worth noting that the 5th percentile of the Average Study Duration is going to give 1099 events by the time 216.232. The 95th percentile is 351.112 which is the maximum duration the study can take. You could have changed the percentiles input, if you want to be more specific. 1760 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For instance you can input 100% to get the value of maximum study duration. Let us have a look at the individual files stored in the Library. The Events Prediction Plot (invoked using the tool in the Library pane) shows that the median number of events 1112 are reached in a duration of 251.335 units of time. In order to find the average study duration for getting the median number of events, select the Input option Events in the plot. Enter 1112 for Events. The median study duration is 259.115. Since the 95% upper limit does not exist, one can not forecast the latest time to reach the target. 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete 1761 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Invoke the (Dropout Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 250.58, the median number of dropouts in both the control and treatment arms would be 92 in a 95% confidence interval of 76 to 108. If you select the Show Predicted Avg. Dropouts the predicted dropouts will be added to the plot. Summary Data In the earlier section we saw how to generate the events and follow all subjects till they experience either events or drop out. We estimated the study duration with the help of Predict feature in East. For this to use, we assumed that an interim subject-level data was available which had information on individual arrival time, status etc. However, many a times, the subject-level data may not be available. Instead, the summary of the accruals that have happened till date can be available. For 1762 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 example, in the case of Rales trial considered above, the DMC statistician may have the information that there have been 1205 subjects accrued so far out of which 206 have produced events. In all 17 subjects have dropped out. The last subject arrived at time 15.55. The DMC statistician is interested in knowing the total study duration when all the accrued 1205 subjects are followed till end. East through its Predict feature makes it possible to still come up with an estimate of average study duration based on simulating events from Poisson process based on the specified or default hazard rate. To see this, choose the menu item Analysis>Predict>Events and Enrollment-Blinded Data. Select the Input Summary Data and Accruals Complete. Enter the above mentioned inputs for the quantities required. 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete 1763 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Click Next. The next input dialog appears as shown here: The default value for Hazard Rate is shown in the dialog. Note that this is not estimated from any data as we don’t have the individual subject data as input. Let us use the same hazard rate input as in the previous section, namely 0.02683. Go to the Accruals/Dropouts tab. Suppose instead of drop out hazard rates, the information is available on the probabilities of drop out. Suppose the probability of drop out for a subject receiving any of the Control or Treatment is 0.5% which is 1764 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 applicable from the current calendar time onwards 15.55. Give all these inputs. Go to Simulation Controls tab. Give a fixed seed 12345. Save the outputs for Summary and Subject data. Click on Simulate. East simulates the events according to Poisson Arrival process with inter-arrival times following exponential distribution. The parameters are derived from the specified hazard rates for Control and Treatment. For details refer to the Appendix M. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim2 with sub-nodes for 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete 1765 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis SummaryStat and SubjectData in the Library. To view the detailed summary output of the simulations, double click the node PredictSim2 in the Library. The following output is displayed. The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. Since the accruals were complete, the Target Sample Size and the Target Number of Events are 1205. The table Actuals from Interim Trial Data: Sample Size and Events presents the detailed information of the subject data such as number of events, drop outs, average follow up etc. Observe that at the end of the current time, the subjects in pipeline are 982. These are followed till the end of the study. The study is complete when in all 1205 events occur. The table Average Sample Size and Events provide information about the average study duration, average number of events, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 249.489 units of time to complete the study. The Overall Output table describes the details of the distribution of Average Study Duration across 1000 simulations. Note that the column No of Accruals has all values 1205 since the accruals were complete and only events are 1766 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 being forecasted. Although the targeted number of events was 1205, the No of Events column shows different values less than 1205 as the accruals were complete and only these subjects were followed till they produce events or drop out. It is worth noting that the 5th percentile of the Average Study Duration is going to give 1014 events, by the time 199.048 and the 95th percentile is 315.872 by which again only 1051 events have been occurred. The investigator has to decide whether to wait for a longer time for getting a few more events. The Events Prediction Plot (invoked using the tool in the Library pane) shows that the median number of events 1033 are reached in a duration of 224.944 units of time. Invoke the (Dropout Prediction Plot) using the tool in the Library pane. 68.3 Events and Enrollment- Blinded Data – 68.3.1 Accrual Complete 1767 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis In this plot, the simulation results indicate that by the end of 224.269, the median number of dropouts in both the control and treatment arms would be 171 in a 95% confidence interval of 150 to 194. If you select the Show Predicted Avg. Dropouts the predicted dropouts will be added to the plot. 68.3.2 Events and Enrollment-Blinded Data: Accrual Ongoing Subject-level Data The simulation of the RALES trial in Chapter 66 indicated a required sample size of 1638 subjects with an expected accrual period of around 20 months. The total duration of the study was around 72 months. Assume that an interim look has been taken and subject data are available at this time point. The trial is still accruing subjects and we are interested in forecasting Accrual Duration as well as Study Duration. The trial has accrued in all 1205 subjects so far. The Subject data are available in the file RALES iLook1 SubjectData.csv in the Samples folder of East installation directory. The file RALES iLook1 SubjectData.csv contains a list of subjects accrued so far and the following data for each subject: SiteID: the site at which the subject arrived. ArrivalTime: the time at which the subject arrived. TreatmentID: a variable indicating which group the subject was randomized to (‘1’ for treatment, ‘0’ for placebo). TimeOnStudy: the length of time the subject has been in the study, corresponding to survival time. CensorIndicator: a variable indicating whether the subject is a completer (‘1’), a dropout (‘-1’) or in the pipeline (‘0’). CensorInd: a variable indicating whether the subject is a completer (‘1’) or a non-completer (‘0’). A non-completer can be either a dropout or in the pipeline. 1768 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 A portion of the RALES iLook1 SubjectData file is shown below: The trial is assumed to enroll subjects from several sites, the information about which is provided in the file RALES iLook1 SiteData .csv also available in Samples folder. The file contains the following data for each site: SiteID: the identification number of the site. SiteReadyTime: the time at which the site was initiated. SiteAccrRate: the site accrual rate specified in the enrollment plan. SubjectsAccrued: the number of subjects accrued at the site. LastSubjectRand: the randomization time of the last subject arriving at the site. ObsrvdAccrualRate: the observed accrual rate at the site. PosteriorAccrualRate: the updated site accrual rate. SIP Start: the start of the initiation period of the site. SIP End: the end of the initiation period of the site. Ecap: the enrollment cap, representing the maximum number of subjects that can be enrolled at the site. 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1769 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Both these files can be imported into East using the Import button in the Home ribbon: Once imported, the files will appear in the Library pane as nodes in the active workbook, with extension .cydx. Choose the menu item Analysis>Predict>Events and Enrollment-Blinded Data. In the ensuing dialog box, select the Accrual option, Ongoing. Select data set RALES iLook1 SubjectData.cydx . Tick the check box Include Site-specific Information. Map the variables from the data to the ones shown below: 1770 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Next. The next input dialog appears as shown here: The default value for Hazard Rate is estimated from the subject data. Note that this is estimation of common hazard rate ignoring whether the event is occurring on the treatment arm or control arm. Go to the Accrual/ Dropouts tab. In the ensuing dialog, you will see almost all values filled in. Change the Accrual Model to Poisson. 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1771 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis The Current Calendar Time is accrual time of the last subject in the data which is 15.561. Again, the drop out hazard rate is estimated from the subject data assuming that the data are blinded on treatments. The Target Sample Size is 1807 which is 1.5 ∗ SampleSize. You are free to change the Target Sample Size as we are assuming that the study is still accepting enrollment. The Target Number of Events default value is 1205, the same as in the data. This value can be edited. You can edit the values of hazard rate, target number of events etc. You can choose a specific follow up period as well by selecting For Fixed Period in the Subjects are followed textbox. The number of hazard pieces in the Drop out information also can be increased to specify different hazard rates for different time periods. The Number of pieces equal to 0 will assume that there aren’t going to be any drop outs. For now, let us proceed further with all the default values. Go to Simulation Controls tab. Give a fixed seed 12345. Check the Output Options 1772 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for saving the outputs. Click Simulate. East simulates the arrival of subjects according to Poisson Arrival process with inter-arrival times following exponential distribution. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim1 with sub-nodes for SummaryStat, SubjectData, SiteSummary and SitePara in the Library. To view the detailed summary output of the simulations, double click the node 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1773 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis PredictSim1 in the Library. The following output is displayed. The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. The table Actuals from Interim Trial Data:Sample Size and Events presents the detailed information of the subject data such as events, drop outs, average follow up etc. The table Average Sample Size and Events provide information about the average study duration, average number of events considering both the arms together, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 56.573 units of time to complete the study getting all the 1205 events. Average follow up time for an individual is 24.855. The Average Accrual Duration is 23.058. The Overall Output table describes the details of the distribution of Accrual Duration and Study Duration across 1000 simulations. It is worth noting that the 5th percentile of the Average Study Duration is going to give 1205 events pretty early, by the time 54.29. The 95th percentile is 58.866 which is the maximum duration the study can take. You could have changed the percentiles input, if you want to be more specific. For instance, you can input 100% to get the value of maximum study duration. Open the SummaryStat data by double clicking the sub node. You will see the following display of data (shown in parts). 1774 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1775 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Note that the file contains overall information and not on individual treatment since the data are blinded. Observe that the TotAccruals for every simulation is 1807, and the TotEvents is 1205. The tt TotPending values for Final Look are the subjects which have neither experienced events nor have dropped out till the end of the study. This is because the study is concluded after getting 1205 events and does not proceed till all the subjects experience the event as was the case in the previous section. It is worth noting that the LookTime corresponding to the Final stage is essentially the study duration observed in that particular simulation. Open the Subject Data file which stores detailed information about one simulation. For all the subjects, the survival times and drop out times are generated. Survival Time is the duration for which the subject was alive in the study. DropoutTime is the duration of time the subject was present in the study before dropping out. For the existing interim data, the Drop out times are generated. These are generated using the specified drop out probability. For the new arrivals, accrual times, survival times as well as drop out times are generated. Open the SiteSummary file which stores detailed information about one simulation. 1776 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The file contains averages across all simulations for each site of the quantities such as initiation times, accrual duration, number of subjects enrolled, accrual rate, number of sites opened etc. Open the SiteData file which stores detailed information about one simulation. 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1777 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Click on the PredictSim node. The Enrollments Prediction Plot (invoked using tool in the Library pane) shows that the median number of enrollments the 1807 are reached in a duration of 23.209 units of time. 1778 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Invoke the (Events Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 56.556, the median number of events in both the control and treatment arms would be 1205 in a 95% confidence interval of 1169 to 1242. Invoke the (Dropouts Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 29.549, the median number of drop outs in both the control and treatment arms would be 53 in a 95% confidence interval of 42 to 65. If you select the Show Predicted Avg. will be added to the plot. Dropouts the predicted dropouts 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1779 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis Summary Data In the earlier section we saw how to generate the events and follow them till the required number of events occur or the target sample size is reached. We estimated the study duration with the help of Predict feature in East. For this to use, we assumed that an interim subject-level data was available which had information on individual arrival time, status etc. However, many a times, the subject-level data may not be available. What can be available is the summary of the accruals that have happened till date. For example, in the case of Rales trial considered above, the DMC statistician may have the information that there have been 1205 subjects accrued so far. The number of events occurred so far 206 considering both Control and Treatment. In all 17 subjects have dropped out. The last subject arrived at time 15.55. The DMC statistician is interested in knowing the total study duration when all the accrued 1205 subjects are followed till end. East through its Predict feature makes it possible to still come up with an estimate of average study duration based on simulating events from Poisson process based on the specified or default hazard rate. To see this, choose the menu item Choose the menu item Analysis>Predict>Events and Enrollment-Blinded Data. 1780 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Select the Input Summary Data and Accruals Ongoing. Enter the above mentioned inputs for the quantities required. Click Next. The next input dialog appears as shown here: The default value for the common Hazard Rate is shown in the dialog. Note that this is not estimated from any data as we don’t have the individual subject data as input. Let us use the same hazard rate namely, 0.02683 as in the previous section. The default Target No of events is 1205. The 206 events have already occurred. This means that the accrual will continue till we get 1205 events in all. After filling all these 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1781 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis values, the input dialog looks as shown below: Go to the Accruals/Dropouts tab. Suppose instead of drop out hazard rates, the information is available on the probability of drop out. Suppose the probability of drop out for a subject is 0.5% and is applicable from the current calendar time onwards which is 15.55. Give all these inputs. Go to Simulation Controls tab. Give a fixed seed 12345. Save the outputs for 1782 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Summary and Subject data. Click on Simulate. East simulates the events according to Poisson Arrival process with inter-arrival times following exponential distribution. The parameter is derived from the specified hazard rate. For details refer to the Appendix M. After a few seconds East will display the message Simulations complete. Waiting for User’s action. Click Close. This will save the Predict Simulation in the Output Preview window first. Once you save it in the Workbook, it will create a node PredictSim4 with sub-nodes for SummaryStat and SubjectData in the Library. To view the detailed summary output of the simulations, double click the node 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1783 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis PredictSim4 in the Library. The following output is displayed. The table at the left describes the Simulation scenario. This summary contains an overview of the actual data we observed. The table Actuals from Interim Trial Data:Sample Size and Events presents the summary data input. The study is complete when in all 1205 events occur. The table Average Sample Size and Events provide information about the average study duration, average number of events, average drop outs, average follow up time etc. From this table it should be noted that it will take on an average 56.187 units of time to complete the study. The Overall Output table describes the details of the distribution of 1784 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Average Study Duration across 1000 simulations. Note that the column No of Events has all values 1205(except for actuals) meaning thereby the target sample size of 1807 was adequate in giving the required number of events. It is worth noting that the 5th percentile of the Average Study Duration is going to give 1205 events pretty early, by the time 54.231. The 95th percentile is 58.321 by which almost in all cases the target would be achieved. Click on the PredictSim4 node. The Enrollments Prediction Plot (invoked using the tool in the Library pane) shows that the median number of enrollments 1807 are reached in a duration of 31.104 units of time. Invoke the (Events Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 56.173, the median 68.3 Events and Enrollment- Blinded Data – 68.3.2 Accrual Ongoing 1785 <<< Contents 68 * Index >>> Enrollment/Events Prediction - Analysis number of events in both the control and treatment arms would be 1205 in a 95% confidence interval of 1166 to 1240. Invoke the (Dropouts Prediction Plot) using the tool in the Library pane. In this plot, the simulation results indicate that by the end of 29.321, the median number of drop outs in both the control and treatment arms would be 22 in a 95% confidence interval of 18 to 27. If you select the Show Predicted Avg. will be added to the plot. 1786 68.3 Events and Enrollment- Blinded Data Dropouts the predicted dropouts <<< Contents * Index >>> 69 69.1 What is East PROCs Interfacing with East PROCs East PROCs is a special version of East 6.3 developed specially for SAS (R) users. East PROCs contains an external SAS procedure for Interim Monitoring of a design created in East 6.3. While it has all the capabilities of East Interim Monitoring, it requires SAS (R) system on your machine. Proc EASTMONITOR from East PROCs, through its various options can perform monitoring of the following group sequential designs in East. 1. Continuous Endpoints One Sample: Single mean One Sample: Mean of paired differences Two Samples: Difference of means from independent populations 2. Discrete Endpoints One Sample: Single binomial proportion One Sample: McNemar’s for matched pairs of binomial responses Two Samples: Difference of binomial proportions from independent populations Two Samples: Ratio of binomial proportions from independent populations Two Samples: Odds ratio of proportions from independent populations Two Samples: Common odds ratio for stratified 2x2 tables 3. Survival Endpoints Two Samples: Logrank test given accrual duration and accrual rates Two Samples: Logrank test given accrual duration and study duration 4. General Information based Sample Size based The trials can be Superiority or Noninferiority with either or both efficacy and futility boundaries. For details of combinations of efficacy and futility boundaries, boundary families etc allowed per test, the user is referred to East 6 user manual. Apart from Interim Monitoring of the above mentioned group sequential designs, Proc EastMonitor can also monitor Adaptive Trials created in East 6 based on the following tests: Continuous: Difference of means from two independent populations 69.1 What is East PROCs 1787 <<< Contents 69 * Index >>> Interfacing with East PROCs Binary: Difference of proportions from two independent populations Binary: Ratio of proportions from two independent populations Survival: Logrank test given accrual duration and accrual rates Survival: Logrank test given accrual duration and study duration The syntax for each of the above mentioned interim monitoring is described in the East PROCs user manual. 69.2 Why Proc EastMonitor Clinical trial data are generally analyzed using SAS. Proc EastMonitor has been developed to enable the interim analysis of clinical trial data using SAS considering that the East has been used for designing the study. In other words, you don’t need East to be available for interim monitoring. What you need is a design created in East. This design and interim look data are inputs to Proc EastMonitor. Proc EastMonitor then performs the interim analysis exactly the same way as East Interim Monitoring module would have done it. This is possible because Proc EastMonitor calls the East interim monitoring programs internally. The resulting output is available in SAS data sets as well as in the list files which include the decisions of the interim analysis regarding the continuation of the trial or otherwise. The generated output data sets can be subjected to SAS’ graphical and reporting tools for creating reports as per requirement. East being the pioneering software in designing of phase 3 clinical trials, encompasses numerous combinations of efficacy and futility boundaries and other features such as accrual, drop out etc. The boundaries that are available in East run the gamut between extreme conservatism and extreme liberality for early stopping. It can also handle the designs with missing efficacy or futility boundaries at some looks. All these designs can be monitored using Proc EastMonitor. Besides the non-adaptive designs, East can formulate adaptive designs following Cui, Hung and Wang (1999). This adaptive design allows modification of sample size and effect size at an interim look. In effect, the adaptive designs are also amenable for interim monitoring in SAS through Proc EastMonitor. As a result, with Proc EastMonitor as an add on to SAS, the whole interim monitoring capability of East becomes available in SAS and will continue to be so for further new designs in East. 1788 69.2 Why Proc EastMonitor <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 69.3 Continuous Endpoint: Orlistat Trial Consider the Orlistat trial described in Section 10.1.1 where we would like to test the null hypothesis that treatment does not lead to weight loss, H0 : δ = 0, against the alternative hypothesis that the treatment does result in a loss of weight, H1 : δ = 3. Suppose we have designed this trial in East 6 and following is the detailed output of this design. Let us monitor this trial using PROC EASTMONITOR. Save the design details into the CSV format. Right click on the design node in the Library and select Export to CSV. This CSV file will serve as an input to the PROC EASTMONITOR. 69.3 Proc IM Normal Endpoint 1789 <<< Contents 69 * Index >>> Interfacing with East PROCs Launch East PROCs and import the above CSV file in SAS. The code in SAS to import the file may look like as shown below: PROC IMPORT OUT= INPUT.orlistat_des DATAFILE= "D:\Work\EAST6.3\ProcIM\Orlistat\Orlistat.csv" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; RUN; Now suppose you have the data to be used for interim monitoring in a SAS file. The following code reads the design information from the dataset orlstat des; IM data from the dataset Orlistat im; monitors the trial; computes the look-by-look output quantities and saves the output in the form of a SAS datasets. libname input "D:\Work\EAST6.3\ProcIM\Orlistat";run; libname out "D:\Work\EAST6.3\ProcIM\Orlistat\out";run; options nodate nonumber; PROC EASTMONITOR DESIGN=input.Orlistat_Des DATA=input.Orlistat_im; CONDPOWER OUT=out.cp_Orlistat ; PHP OUT=out.php_Orlistat ; ERRSPEND OUT=out.errspd_Orlistat ; CI OUT=out.ci_Orlistat; BOUNDARY OUT=out.bdd_Orlistat ; OUTPUT OUT=out.IM_INFO_Orlistat ; run; The output from PROC EASTMONITOR can be seen in the Output window of SAS. It is divided into two parts - Design Output and IM Output. The Design Output part contains all the information exported from East 6. The IM output is actually the output we are interested in. 1790 69.3 Proc IM Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The SAS System Output from East (r) PROCs (v1.0) under _SAS9_2 or latter Copyright (c) 1987-2014 Cytel Inc., Cambridge, MA, USA. INTERIM MONITORING: DIFFERENCE OF MEANS Design Input Parameters ------------------------------------------------------------------Design ID : DOM_Sup Design DataSet :IM DataSet : ------------------------------------------------------------------Test Parameters Design Type : Superiority No. of Looks : 3 Test Type : 1-Sided Specified Alpha : 0.0500 Power : 0.9001 ------------------------------------------------------------------Model Parameters Input Method : Individual Means Diff. in Mean : 3.0000 Mean Control : 6.0000 Mean Treatment : 9.0000 Std. Deviation : 8.0000 Test Statistic : Z Allocation Ratio(nt/nc) : 3.0000 ------------------------------------------------------------------Boundary Parameters Efficacy Boundary : LD(OF) The SAS System Output from East (r) PROCs (v1.0) under _SAS9_2 or latter Copyright (c) 1987-2014 Cytel Inc., Cambridge, MA, USA. Detailed Design: Two-Sample Test- Parallel Design - Difference of Means Sample Size Information Control Arm Treatment Arm Total 248 192.4477 246.5940 331 256.5906 329.1150 Sample Size (n) Maximum: Expected H1: Expected H0: 83 64.1428 82.5210 Maximum Information for this design is 0.9697 The SAS System 69.3 Proc IM Normal Endpoint 1791 <<< Contents 69 * Index >>> Interfacing with East PROCs Output from East (r) PROCs (v1.0) under _SAS9_2 or latter Copyright (c) 1987-2014 Cytel Inc., Cambridge, MA, USA. Stopping Boundaries: Look by Look Look No. 1 2 3 Info Fract Sample Size (n/n_max) (n) 0.3323 0.6677 1.0000 Cumulative Alpha Spent Boundaries Efficacy (Z) 0.0007 0.0165 0.0500 3.2055 2.1387 1.6950 110 220 330 Boundary Crossing Probability (Incremental) Under H0 Under H1 Efficacy Efficacy 0.0007 0.0158 0.0335 0.0665 0.5429 0.2907 The SAS System Output from East (r) PROCs (v1.0) under _SAS9_2 or latter Copyright (c) 1987-2014 Cytel Inc., Cambridge, MA, USA. Interim Monitoring Output Look No Information Fraction 1 2 3*& 0.3323 0.6677 1.0000 Look No. Cumulative Sample Size 1 2 3*& Look No. 1 2 3*& 110 221 331 Cumulative Sample Size 110 221 331 Cumulative Sample Size 110 221 331 Effect Size 3.0000 2.0000 3.0000 INLP 326 331 NA Test Statistic Efficacy 1.7031 2.0000 3.0000 3.2055 2.1387 1.6950 Standard Error 1.7615 1.0000 1.0000 CP Repeated 95.00% CI Lower Upper -2.6466 -0.1387 1.3050 Repeated p-value Infinity Infinity Infinity 0.2462 0.0636 0.0014 Predictive Power 0.9438 0.9041 NA 0.8229 0.8570 NA *: At Look 3 the value of Test Statistic is >= the critical point for efficacy, H0 is rejected. &: At Look 3 with the current cumulative sample size, the desired power is achieved or exceeded. In order to preserve the operating characteristics of the study, East has forced this to be the last look. The SAS System Output from East (r) PROCs (v1.0) under _SAS9_2 or latter Copyright (c) 1987-2014 Cytel Inc., Cambridge, MA, USA. Final Inference 1792 69.3 Proc IM Normal Endpoint <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Final Outputs at Look : 3 Adj. p-value : 0.0167 Adj. Pt. Est. for Effect Size : 2.5047 Adj. 90.00% CI for Effect Size Upper confidence bound Lower confidence bound Post-Hoc Power : 4.3144 : 0.5825 : 0.9001 Notice that the PROC EASTMONITOR prints the decision which was taken at the end of the trial. In the end, it prints the final inference as well. One can also see this output back in the East6. To do that, export the output from SAS to a CSV file. The code to export the output dataset may look like as shown below: PROC EXPORT DATA= OUT.Im_info_orlistat OUTFILE= "D:\Work\EAST6.3\ProcIM\Orlistat\out\imout_orlistat.csv" DBMS=CSV REPLACE; PUTNAMES=YES; RUN; Activate East 6 and go back to the design node in Library. Insert a new IM dashboard icon. Right click on the Interim Monitoring node and select by clicking the Import PROC EM Output. Import the CSV file imout orlistat.csv. The Interim Monitoring dashboard gets updated with the output from PROC 69.3 Proc IM Normal Endpoint 1793 <<< Contents 69 * Index >>> Interfacing with East PROCs EASTMONITOR. 1794 69.3 Proc IM Normal Endpoint <<< Contents * Index >>> Volume 9 Analysis 70 Introduction to Volume 9 71 Tutorial: Analysis 1806 72 Analysis-Descriptive Statistics 73 Analysis-Analytics 74 Analysis-Plots 1798 1827 1837 1854 75 Analysis-Normal Superiority One-Sample 1890 76 Analysis-Normal Noninferiority Paired-Sample 1901 77 Analysis-Normal Equivalence Paired-Sample 78 Analysis-Normal Superiority Two-Sample 1907 1913 79 Analysis-Normal Noninferiority Two-Sample 80 Analysis-Normal Equivalence Two-Sample 81 Analysis-Nonparametric Two-Sample 1926 1941 1956 <<< Contents * Index >>> 82 Analysis-ANOVA 1976 83 Analysis-Regression Procedures 1987 84 Analysis-Multiple Comparison Procedures for Continuous Data 2024 85 Analysis-Multiple Endpoints for Continuous Data 2055 86 Analysis-Binomial Superiority One-Sample 2060 87 Analysis-Binomial Superiority Two-Sample 2069 88 Analysis-Binomial Noninferiority Two-Sample 89 Analysis-Binomial Equivalence Two-Samples 90 Analysis-Discrete: Many Proportions 2111 91 Analysis-Binary Regression Analysis 2131 2088 2106 92 Analysis- Multiple Comparison Procedures for Binary Data 2180 93 Analysis-Comparison of Multiple Comparison Procedures for Continuous Data- Analysis 2207 94 Analysis-Multiple Endpoints for Binary Data 1796 2211 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 95 Analysis-Agreement 96 Analysis-Survival Data 2216 2219 97 Analysis-Multiple Comparison Procedures for Survival Data 2240 1797 <<< Contents * Index >>> 70 Introduction to Volume 9 This volume describes the procedures for analyzing data for continuous, binary, discrete and survival endpoints. Analysis of data arising from clinical trials with one arm, two arm as well as multiple arms is possible with the help of the Analysis module of East. The procedures include Basic Statistics and Plots used for exploratory analysis of data and at higher level, Logistic and Probit regression as well as tests for handling analysis of crossover data. Exact Inference tests for two by two categorical data and multiple comparison tests for continuous and discrete data also belong to the Analysis menu. For a few tests, a link to SAS R is provided which enable user to perform analysis in SAS and display the output in East. Chapter 4 introduces the data editor features such as creating a new data, manipulating existing data, sorting, filtering, transforming variables, generating random numbers from distributions etc. East caters to Case Data and Crossover Data. Chapter 71 explains the workflow in Analysis used for analyzing any data. This chapter describes how you can use the data editor capabilities effectively and perform the statistical test you want. Chapter 72deals with preliminary exploration of data using elementary tools such as computation of summary measures, classification, cross tabulation of the data. Descriptive Statistics helps statisticians to choose statistical analysis techniques to arrive at meaningful inference. Chapter 73describes some of the commonly used univariate procedures: t-test (paired and independent), one-way and two-way (without interaction) analysis of variance (AN OV A) and multiple linear regression. The topics of correlations and Multivariate Analysis of Variance (M AN OV A) are also included in this chapter. Chapter 74deals with data exploration plots for case data and crossover data. Chapter 75demonstrates how Eastcan be used to perform inferences on data collected from a single-sample superiority study with continuous endpoint. This may consist of a random sample of observations from either a single treatment or paired observations from two treatments. Chapter 76explores how we can use East to perform inference on continuous data collected from a paired-sample noninferiority study. 1798 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Chapter 77 demonstrates how inference on continuous data collected from a paired-sample equivalence study can be performed. Chapter 78 deals with analysis of continuous data coming from two independent samples and crossover superiority studies. Chapter 79 deals with analysis of continuous data coming from two independent samples and crossover noninferiority studies. Chapter 80 explains how we can use East to perform analysis of continuous data that comes from two independent samples and crossover equivalence studies. Chapter 81 describes analysis using Wilcoxon-Mann-Whitney nonparametric test for parallel as well as crossover designs. Analysis of data from both superiority and noninferiority studies is possible. Chapter 82 focuses on Analysis of Variance (AN OV A). The technique is useful in clinical trial data analysis whenever there are multiple responses or multiple doses of an experimental drug being compared with placebo. The chapter deals with One way, Two way and One way repeated measures ANOVA with SAS connection. Chapter 83 demonstrates how to run regression analysis in East. East can perform multiple linear regression, repeated measure regression and fit linear mixed effect (LME) model on data obtained from 2x2 crossover design. Link to SAS is also available for the repeated measures regression and linear mixed effects model. Chapter 84deals with multiple comparison procedures in which multiple treatments are compared against a placebo or active control. The response is of continuous type. The procedures included are parametric and p-value based. For multiple comparison procedures in East we can either provide the dataset containing the observations under each arm or the raw p-values to obtain the adjusted p-values. Chapter 86 demonstrates how to perform inferences on data collected from a single-sample superiority study when the observations on a binary variable have an unknown probability of success. You need to either test a null hypothesis about the probability, or compute an exact confidence interval for the probability of success. The section also discusses the analysis of paired data on a binary random variable. The chapter also discusses Exact test for paired samples. Chapter 87explores how to analyze data from two independent binomial samples 1799 <<< Contents 70 * Index >>> Introduction to Volume 9 generated while conducting a superiority trial. This comparison is based on difference of response probabilities, ratio of proportions or odds ratio of the two populations. Exact inference in case of difference of proportions and ratio of proportions is described. Chapter 88deals with noninferiority trials involving data from two independent binomial samples. This comparison is based on difference of proportions, ratio of proportions or odds ratio of the two populations. For difference of proportions and ratio of proportions exact inference is supported which is described in this chapter. Chapter 89 explains how we can use East to perform analysis of data that comes from two independent binomial samples equivalence studies. Both asymptotic and Exact options are described. Chapter 90 deals with situations for discrete data, where the data are either coming from many binomial populations or the responses are from multinomial distribution. In case of multiple binomial populations, the interest lies in testing whether the success probability differs across several binomial populations, in particular does it increase or decrease with reference to an index variable. For data coming from multinomial distributions, one is interested in testing if the cell probabilities are according to some theoretical law. East can be used to analyze both these types of data. Chi-square tests, Wilcoxon rank sum test for ordered categorical data, trend in R ordered populations are some of the tests described in this chapter. Chapter 91 focuses on how to run binary regression analysis. East provides logistic, probit, and complementary log-log regression models for data with a binary response variable. Along with regular maximum likelihood inference for logistic model, East provides Firth bias-correction for asymptotic estimates for unstratified logistic regression. Profile likelihood based confidence intervals for estimates are available for unstratified data. Chapter 92 explains how to analyze data arising out of multiple comparison studies where more than one treatment are compared against a placebo or active control. The procedures included are parametric and p-value based. For multiple comparison procedures in East we can either provide the dataset containing the observations under each arm or the raw p-values to obtain the adjusted p-values. Chapter 93 deals with comparison of different multiple testing procedures for continuous end point through an illustrative example. 1800 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Chapter 95 discusses Cohen’s Kappa and the Weighted Kappa measures. These two measures are used to assess the level of agreement between two observers classifying a sample of objects on the same categorical scale. Chapter 96 deals with comparison of two survival curves using Logrank Test in superiority and noninferiority studies. The chapter also demonstrates how one can obtain a plot of multi-arm Kaplan Meier Estimator in East. Chapter 97 explains how to analyze data arising out of multiple comparison studies with survival endpoint where more than one treatment are compared against a placebo or active control. The following section discusses the Global Options in East 6. Most of them are not applicable to Analysis menu but some options like Data Path or Display Precision settings for analysis can be set. 1801 <<< Contents 70 70.1 * Index >>> Introduction to Volume 9 Settings Click the icon in the Home menu to adjust default values in East 6. The options provided in the Display Precision tab are used to set the decimal places of numerical quantities. The settings indicated here will be applicable to all tests in East 6 under the Design and Analysis menus. All these numerical quantities are grouped in different categories depending upon their usage. For example, all the average and expected sample sizes computed at simulation or design stage are grouped together under the category ”Expected Sample Size”. So to view any of these quantities with greater or lesser precision, select the corresponding category and change the decimal places to any value between 0 to 9. The General tab has the provision of adjusting the paths for storing workbooks, files, and temporary files. These paths will remain throughout the current and future sessions even after East is closed. This is the place where we need to specify the installation directory of the R software in order to use the feature of R Integration in East 6. 1802 70.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Design Defaults is where the user can change the settings for trial design: Under the Common tab, default values can be set for input design parameters. You can set up the default choices for the design type, computation type, test type and the default values for type-I error, power, sample size and allocation ratio. When a new design is invoked, the input window will show these default choices. Time Limit for Exact Computation This time limit is applicable only to exact designs and charts. Exact methods are computationally intensive and can easily consume several hours of computation time if the likely sample sizes are very large. You can set the maximum time available for any exact test in terms of minutes. If the time limit is reached, the test is terminated and no exact results are provided. Minimum and default value is 5 minutes. Type I Error for MCP If user has selected 2-sided test as default in global settings, then any MCP will use half of the alpha from settings as default since MCP is always a 1-sided test. Sample Size Rounding Notice that by default, East displays the integer sample size (events) by rounding up the actual number computed by the East algorithm. In this case, the look-by-look sample size is rounded off to the nearest integer. One can also see the original floating point sample size by selecting the option ”Do not round sample size/events”. 70.1 Settings 1803 <<< Contents 70 * Index >>> Introduction to Volume 9 Under the Group Sequential tab, defaults are set for boundary information. When a new design is invoked, input fields will contain these specified defaults. We can also set the option to view the Boundary Crossing Probabilities in the detailed output. It can be either Incremental or Cumulative. Simulation Defaults is where we can change the settings for simulation: If the checkbox for ”Save summary statistics for every simulation” is checked, then East simulations will by default save the per simulation summary data for all the 1804 70.1 Settings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 simulations in the form of a case data. If the checkbox for ”Suppress All Intermediate Output” is checked, the intermediate simulation output window will be always suppressed and you will be directed to the Output Preview area. The Chart Settings allows defaults to be set for the following quantities on East6 charts: We suggest that you do not alter the defaults until you are quite familiar with the software. 70.1 Settings 1805 <<< Contents * Index >>> 71 Tutorial: Analysis The Analysis menu of East 6.3 contains various procedures for analyzing data. The procedures include Basic Statistics and Plots used for exploratory analysis of data and at higher level, Logistic and Probit regression as well as tests for handling analysis of crossover data. Exact Inference tests for two by two categorical data and multiple comparison tests for continuous and discrete data belong to the Analysis menu. For a few tests, we provide a link to SAS R for invoking SAS R procedure or user’s SAS program which will do the analysis in SAS R and display the results on East screen. All the procedures in the Analysis menu are broadly grouped under the following categories. Basic Statistics Continuous Discrete Events Basic Plots Crossover Plots Each of these categories is further divided into several sub menus consisting of the procedures related to that particular category. For example, if you traverse Analysis > (Discrete) Many Samples You will see the following list of available procedures. 1806 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that the procedures are grouped under Single Arm Design, Parallel Design etc. In this tutorial, we will take you on a tour of Analysis procedures available in East. 71.1 DataTypes East can do analysis on data in Case Data or Crossover Data formats. Except for the procedures specifically marked for crossover analysis, all other procedures can be carried out on Case Data. Case data can be viewed and modified using the case data editor. The case data editor displays case data in the form of a sheet where Rows represent records and Columns represent variables. The variables can be of binary, string, categorical or continuous types. You can create a new Case Data sheet by clicking New Data on File menu. For more details about Case Data Editor and Cross over Data Editor refer to the Chapter 4. For illustrative purposes we have included Data files in the Samples folder available in the Installation Directory of East. A typical Case data file in a case data editor looks as shown below: 71.1 DataTypes 1807 <<< Contents 71 * Index >>> Tutorial: Analysis This is a view of Body weight.cyd data opened in East. These data contain 22 records on four variables: Dose, Animal, Week and Weight. While Dose is string variables, all others are of numeric type. A typical Crossover Data file in crossover data editor looks as follows: 1808 71.1 DataTypes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The above is a view of the Euphylong.cyd file, that contains 2x2 crossover data on two drugs T and R administered in two sequences G1=T,R and G2=R,T. The variables P1 Resp and P2 Resp are the variables representing responses of patients in the first and second periods respectively. 71.1 DataTypes 1809 <<< Contents 71 71.2 * Index >>> Tutorial: Analysis Using Case Data Editor Features There are a whole lot of capabilities available with the Case Data editor. You can sort, filter transform variables, etc., and perform the analysis on the modified data. For instance, suppose you open the data set Leukemia.cyd from Samples folder. The data consist of three variables Drug, Time and Status, a part of which is shown below. Suppose you want to consider data only for Status =1. This is possible by filtering the data using filter command as described below. 1810 71.2 Using Case Data Editor Features <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click on the filter icon on the Data Editor menu. In the ensuing dialog box, press If Condition and then Build expression button. You will see the following dialog box. 71.2 Using Case Data Editor Features 1811 <<< Contents 71 * Index >>> Tutorial: Analysis Use the selections available to build the expression as shown below. Press OK. This will select the data on Status =1 as active. All other inactive records in 1812 71.2 Using Case Data Editor Features <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the data will be displayed in blue color. A partial data is shown in the following figure. Let us perform the Difference of Means Test on the filtered data. Select Difference of Means Test as: Analysis > (Continuous) Two Samples > (Parallel Design) Difference of Means In the ensuing dialog box, select Drug as Population Id, Placebo as Control and Time as Response variable. No need to select any variable as Frequency Variable. 71.2 Using Case Data Editor Features 1813 <<< Contents 71 * Index >>> Tutorial: Analysis The Input dialog box will look as follows: Click OK. You will see the following output. 1814 71.2 Using Case Data Editor Features <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The output is displayed in three parts. The first part specifies the Null as well as Alternative hypotheses to be tested. East considers testing of both one sided and two sided alternative hypotheses and computes the corresponding p- values. In the second part, Input details such as data file name, Population Id and the test to be performed are mentioned. In the last part, the output contains number of records in the data, number of record rejected, summary of data and the Inference. From the above output, it is clear that there are in all 42 records, out of which 12 are rejected by filtering and the t test is applied on the remaining 30 records. Both one sided and two sided p-values indicate that the null hypothesis Ho cannot be rejected at 5% level of significance. 71.3 Sub-Group Analysis using By variable In almost every data analysis feature, East provides facility for doing sub-group analysis on maximum of two variables. For instance, consider running ’One way analysis of variance (ANOVA)’ procedure on the Myeloma.cyd data set available in the Samples folder of the installation directory of the product. The authors Krall, Uthoff and Harley (1975) provide data of a survival study that include the survival times, in months, of 65 multiple myeloma patients with data on 15 concomitant variables. In this example, we like to perform one way ANOVA on the variable ’survmth’ in subgroups of Males and Females separately. Suppose we want to see if average survival is different across age groups, we would first like to categorize variable age into a factor. This we can do using the RCODE function available in Transform Variable in Case Data Editor. Open Myeloma.cyd from the Samples folder of East. In order to know the Minimum and Maximum value of age, choose from the menu: Analysis > (Basic Statistics) Descriptive Statistics > Summary Statistics 71.3 Sub-Group Analysis using By variable 1815 <<< Contents 71 * Index >>> Tutorial: Analysis Select the variable age and Minimum and Maximum as shown in the following dialog box. Click OK.You will see the following output. Accordingly let us form the age groups as follows: 1. 2. 3. 4. 1816 AgeCode=1 for Age from 21 to 40 AgeCode=2 for Age from 41 to 60 AgeCode=3 for Age from 61 to 80 AgeCode=4 for Age not less than 81 71.3 Sub-Group Analysis using By variable <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This grouping can be done using RCODE function. Here we show how to accomplish this. For demonstrating the grouping capability, we would use Myeloma dataset. This dataset when opened looks like this: In the next column after sercalcium (which is the last variable in the dataset) construct a new variable by clicking the Transform icon 71.3 Sub-Group Analysis using By variable on the Data Editor menu. In the 1817 <<< Contents 71 * Index >>> Tutorial: Analysis ensuing dialog box, type the Transform command as shown below. Click OK. This will generate a new variable AgeCode with values from 1 to 4. To run One way ANOVA procedure, follow the steps: Analysis > (Continuous) Many Samples > One way ANOVA You will see the following input dialog box. In it select AgeCode as Factor and survmth as Response. 1818 71.3 Sub-Group Analysis using By variable <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click Advanced Tab. Select gender in By variable 1 drop down box. Click OK. The output obtained is as shown below. 71.3 Sub-Group Analysis using By variable 1819 <<< Contents 71 1820 * Index >>> Tutorial: Analysis 71.3 Sub-Group Analysis using By variable <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Notice that East has performed one way ANOVA procedure on the two subgroups formed by gender=1 and gender=2 respectively. There is no significant effect of Age on the survival. This is true in case of both male and female patients. 71.4 Workflow for Analysis In this section we will walk you through the steps that will be generally followed while performing any analysis in East. 71.4.1 Getting Data into East Data may be entered into East as case data or as crossover data, read in as a previously saved East file (.cydx) through the Open command in the File menu, or read in from another software package through the Import command. In this tutorial, you will read in a previously saved data file using the Open command. For illustrative purposes, let us consider performing Difference of Means Test on the data Myeloma.cyd available in the Samples folder. Open the data set Myeloma.cyd from Samples folder. If there are several workbooks in Library, East will ask for the workbook you would like to store the data as shown in the following dialog: 71.4 Workflow for Analysis – 71.4.1 Getting Data into East 1821 <<< Contents 71 * Index >>> Tutorial: Analysis Suppose you choose Wbk1. A node named Myeloma.cydx will be created in Wbk1 as shown below: You may rename the data node by right clicking on it. 71.4.2 Choose the Test Choose the test from the appropriate submenu of Analysis. In this case, select; Analysis > (Continuous) Two Samples >(Parallel Designs) Difference of Means. In the ensuing dialog, select the variables as shown below. 1822 71.4 Workflow for Analysis – 71.4.2 Choose the Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click OK to execute the test. You will see the following output. 71.4.3 Output The output is divided into three sections. The first one is the Hypothesis where the null and alternative hypothesis for 2 sided and 1-sided tests are stated. The next section is the Input Parameters section. This section tells us the name of 71.4 Workflow for Analysis – 71.4.3 Output 1823 <<< Contents 71 * Index >>> Tutorial: Analysis data file and response variable used in the analysis, type of test performed, confidence level set for the analysis and other parameter(s) used in the analysis. This section is very important to review to make sure that we specified all the input correctly. The last section is Output. First part of the output is the Summary of the observed Data. It contains descriptive statistics such as minimum, maximum, mean, median and standard deviation of the response variable within the two treatments groups. The remaining part of the output contains inference for t test. The standardized effect size is -0.31 with t statistic value as -1.1 which with 63 d.f. is non-significant. Accordingly, the test fails to reject the null hypothesis at 5% level of significance. This is substantiated by observing that the 95 % confidence interval includes value 0. You will see three icons at the top of the Analysis output. Using print the output. The icon you can icon is used to save the output as HTML. With icon you can readily change the display settings, in particular the number of decimal points on the output. If you click this icon the following dialog comes up. Change the display precision for ’Others’ category to 6 decimals as shown in the above dialog box and click OK. You will see the following output with, other than 1824 71.4 Workflow for Analysis – 71.4.3 Output <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Beta and p- values, all values are displayed up to 6 decimals. 71.4.4 Links to SAS The Analysis module of East facilitates analysis using SAS Procedures on the data in two ways. 1. Invoking SAS through SAS link ’Run Using SAS’ provided on the Advanced tab for the tests Linear Mixed Effects Model: Difference of Means and Ratio of Means. These tests are part of the Regression menu from Analysis: Continuous. By doing this, East will invoke Mixed procedure of SAS. You can also choose not to use SAS. If you use SAS, you will have the option of including covariates in your model. Without SAS, your model will not include covariates. 2. Using SAS command option available in the tests for 2x2 crossover tests in the Regression menu Analysis: Continuous. With this option, you can use your own data set and SAS commands. This will utilize the data in East and run the PROC specified in your SAS code. The output will be displayed in the East’s main window. The flexibility offered allows you to write any SAS code. The only exception is that your code should not contain SAS graphics. 71.4 Workflow for Analysis 1825 <<< Contents 71 * Index >>> Tutorial: Analysis For more details of these aspects you are referred to the respective chapters. 1826 71.4 Workflow for Analysis <<< Contents * Index >>> 72 Analysis-Descriptive Statistics Descriptive Statistics, under the Basic Statistics menu, deals with preliminary exploration of data using elementary tools such as computation of summary measures, classification, cross tabulation of the data. Descriptive Statistics helps statisticians to choose statistical analysis techniques to arrive at meaningful inference. In this Chapter, Section (72.1) describes the descriptive statistical measures available in East. Section (72.2) describes the procedure for obtaining frequency distribution for one or more variables in a data set. Section (72.3) details the procedure for obtaining a cross-tabulation of any two variables in a case data file. All these procedures are available only for case data. Note: All measures are computed after dropping observations with missing values. 72.1 Summary Statistics 72.1.1 Example: Summary Statistics East provides results for a set of 16 predefined univariate summary measures for numeric variables in a data set. These measures help you to select the type of analysis to carry out later. The following Descriptive Statistics or Univariate Summary Measures are available. Central Tendency Mean (Std. Error of Mean) Median Mode Geometric Mean Harmonic Mean 72.1.1 Dispersion Standard Deviation Variance Coefficient of variation Maximum Minimum Range Distribution Skewness Kurtosis Summary Count Sum Example: Summary Statistics Dataset: Myeloma.cydx Data Description: The authors Krall, Uthoff and Harley (1975) have provided data of a survival study. It included the survival times in months of 65 multiple myeloma patients with data on 15 concomitant variables. 72.1 Summary Statistics – 72.1.1 Example: Summary Statistics 1827 <<< Contents 72 * Index >>> Analysis-Descriptive Statistics Purpose of the Analysis: To compute summary measures for the variables survmth, haemoglobin and bjprotein grouped based on the survival status status and gender gender. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Descriptive Statistics > Summary Statistics 3. In the Main tab, select the variables of interest. In this example select the following variables: survmth, haemoglobin and bjprotein. Click the Select All button to get the results for all the summary measures. 4. Thereafter, under the Advanced tab choose the variables as shown below: 1828 72.1 Summary Statistics – 72.1.1 Example: Summary Statistics <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. Click OK. You will see the Analysis results as shown below. 72.2 Example: Frequency Distribution The procedure Frequency Distribution displays a separate frequency distribution table for each of the variables specified in a list. The default display includes the values of the variable in sorted order in the first column and the frequencies in the second column. Additional display can be obtained by choosing one or more of the options Percentage, Cumulative<=, Cumulative >= or Compute Percentiles. Dataset: Myeloma.cydx as described in Section 72.1.1. Purpose of the Analysis: To obtain Frequency Distribution table for the variables haemoglobin, age, fractures, and bjprotein. . Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Descriptive Statistics > Frequency Distribution 72.2 Example: Frequency Distribution 1829 <<< Contents 72 * Index >>> Analysis-Descriptive Statistics 3. In the ensuing dialog box (under the Main tab), select the variables of interest in the Selected Variables box. For this example, select the variables: haemoglobin, age, fractures, and bjprotein. In the Frequency Output select all the three checkboxes and select the Compute Percentile check box. 4. Thereafter, under the Advanced tab choose the By Variables as shown below: 1830 72.2 Example: Frequency Distribution <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. Click OK. A partial output of the Analysis results is as shown below. 72.2 Example: Frequency Distribution 1831 <<< Contents 72 72.3 * Index >>> Analysis-Descriptive Statistics Example: Tabulate The Tabulate procedure allows cross-tabulation of any two specified variables of a data set. It also allows specification of a Frequency variable and a By Variable for subgroup analysis. The optional output from this procedure includes row, column, and overall percentages, expected values and chi-square statistics with p-values. Dataset: Job-case.cydx Data Description: This example refers to the data obtained in a general social survey conducted by National Opinion Research Center (1991) among black American women and men. The data is in case data form. For exploratory analysis and presentation, it is useful to summarize these data into a tabular form. The data consist of the following annual income levels: <$5,000 5,000-15,000 15,000-25,000 > 25,000 and job satisfaction levels: Very Dissatisfied A little satisfied Moderately satisfied Very satisfied 1832 72.3 Example: Tabulate <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for 64 women and 40 men. Purpose of the Analysis: To cross tabulate the data for the variables Incomegrp and Jobsatis grouped by gender. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Descriptive Statistics > Tabulate 3. In the ensuing dialog box, choose the variables as shown below. 4. Thereafter, under the Advanced tab choose the variables as shown below. 5. Click OK. You will see the Analysis results as shown below. 72.3 Example: Tabulate 1833 <<< Contents 72 1834 * Index >>> Analysis-Descriptive Statistics 72.3 Example: Tabulate <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 72.3 Example: Tabulate 1835 <<< Contents 72 1836 * Index >>> Analysis-Descriptive Statistics 72.3 Example: Tabulate <<< Contents * Index >>> 73 Analysis-Analytics This chapter describes some of the commonly used univariate procedures: t-test (paired and independent), one-way and two-way (without interaction) analysis of variance (ANOVA) and multiple linear regression. The topic of correlations is also included in this chapter. References for the procedures covered in this chapter are provided in the table shown below: Test t-tests ANOVA Pearson’s Product-Moment Correlation Spearman’s Product-Moment Correlation Kendall’s Tau Regression procedures Collinearity diagnostics Residuals and Influence References Snedecor & Cochran (1989) Kreyszig (1970) Siegel & Castellan (1988) Maindonald J (1984) Belsley, Kuh, & Welsh (1980) Cook & Weisberg (1982) Note: Any observation with a missing value for a variable that is included in the model is excluded from the analysis. 73.1 Example: t-test 73.1.1 Independent t-test 73.1.2 Paired t-test This section describes t-test procedures for analyzing data of independent and paired samples. 73.1.1 Independent t-test Dataset: Myeloma.cydx as described in Section 72.1.1. Purpose of the Analysis: To compare mean Uria level between two groups indicated by the variable status (0-alive, 1-dead). Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Analytics > (t-test) Independent t-test 73.1 Example: t-test – 73.1.1 Independent t-test 1837 <<< Contents 73 * Index >>> Analysis-Analytics 3. In the ensuing dialog box, choose the variables as shown below. 4. Click OK. The results will appear as shown below. You can try running the t-test with unequal variance by selecting the Unequal Variance option on the main tab. 73.1.2 Paired t-test Dataset: Azt1.cyd Data Description: The data from Makuch and Parks (1988) documents the response of serum antigen level to AZT in 20 AIDS patients. Two sets of antigen levels are provided for each patient: Pre-treatment Post-treatment 1838 73.1 Example: t-test – 73.1.2 Paired t-test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Purpose of the Analysis: To compare the mean antigen level among patients after administering the treatment with the mean antigen level before administering the treatment. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Analytics >(t-test) Paired t-test 3. In the ensuing dialog box, choose the variables as shown below. 4. Click OK. You will see the analysis results as shown below. 73.1 Example: t-test – 73.1.2 Paired t-test 1839 <<< Contents 73 73.2 * Index >>> Analysis-Analytics Analysis of Variance 73.2.1 One-way Analysis of Variance 73.2.2 Two-way Analysis of Variance The ANOVA procedure available under Analytics menu can perform simple one-way and two-way analysis of variance, described in this section. 73.2.1 One-way Analysis of Variance Dataset: Leukocyte.cyd Data Description: This data comes from a study done by Kontula et al. (1980) (1982) in which the Glucocorticoid Receptor (GR) Sites per Leukocyte Cell in normal subjects (Group 1) were compared to those in patients with hairy-cell leukemia (Group 2), chronic lymphatic leukemia (Group 3), chronic myelocytic leukemia (Group 4) or acute leukemia (Group 5). One of the aims of the study was to find whether there were any significant differences in the mean number of GR sites per leukocyte cells between these five groups. Purpose of the Analysis: To test whether the mean GR sites per Leukocyte Cell is the same across all groups. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Analytics > ANOVA 3. In the ensuing dialog box, choose the variables as shown below. 1840 73.2 Analysis of Variance – 73.2.1 One-way Analysis of Variance <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK. You will see the analysis results as shown below. 73.2.2 Two-way Analysis of Variance Dataset: swine.cyd Data Description: The data for this example is a subset of the data that comes from a study reported by Snedecor and Cochran (1989). This data relates to dressing percentages of 20 swine that have been classified by breed (5 categories) and sex (2 categories) with 2 swine under each combination of breed and sex categories. Purpose of the Analysis: The aim of this study is to test for the effect of breed and sex on the study measure taken on the animals. 73.2 Analysis of Variance – 73.2.2 Two-way Analysis of Variance 1841 <<< Contents 73 * Index >>> Analysis-Analytics Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Analytics > ANOVA 3. In the ensuing dialog box, choose the variables as shown below. 4. Click OK. You will see the Analysis results as shown below. 1842 73.2 Analysis of Variance – 73.2.2 Two-way Analysis of Variance <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 73.3 Correlations 73.3.1 When to Use Each Measure 73.3.2 Example: Correlations The Correlations procedure under Analytics can be used to compute the following correlation measures for pairs of variables in a data set: Pearson’s Correlation Spearman’s Rho Kendall’s Tau All these measures of correlation range between -1 and +1 with: 0 signifying no association −1 signifying perfect negative association +1 signifying perfect positive association. 73.3.1 When to Use Each Measure All the measures of correlation or association in this section capture in a single number 73.3 Correlations – 73.3.1 When to Use Each Measure 1843 <<< Contents 73 * Index >>> Analysis-Analytics the relationship between two ordered data series. But one measure might be more appropriate than the others under different assumptions about the data. Here are some guidelines on when to use each measure. Pearson: Use the Pearson product-moment correlation coefficient when you can assume that two correlated data series follow a bivariate normal distribution. Spearman: Use the Spearman rank-order correlation coefficient when you cannot make a normality assumption about the two data series. Kendall’s Tau: Use Kendall’s Tau to capture the association between two data series that are ordered implicitly but not numerically. 73.3.2 Example: Correlations Dataset: Myeloma.cyd as described in Section 72.1.1. Purpose of the Analysis: To compute correlations among all pairs of variables from the variables age, bjprotein, haemoglobin, and survmth. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Analytics > Correlation 3. In the ensuing dialog box, select age, bjprotein, haemoglobin, and survmth as the Selected Variables and select all the three checkboxes inside the 1844 73.3 Correlations – 73.3.2 Example: Correlations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Correlation box. 73.3 Correlations – 73.3.2 Example: Correlations 1845 <<< Contents 73 * Index >>> Analysis-Analytics 4. Click OK. You will see the Analysis results as shown below. 1846 73.3 Correlations – 73.3.2 Example: Correlations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 73.4 Multiple Linear Regression 73.4.1 Available procedures 73.4.2 Example: Multiple Linear Regression This section describes the method of fitting a multiple linear regression model for a selected data set. The regression procedures are performed using a variance-covariance updating procedure described in Maindonald, J (1984). The least squared solution is facilitated by using Cholesky decomposition. 73.4.1 Available procedures The procedure available in this section fits a linear model of the form Y = β0 + β1 X1 + β2 X1 + . . . βk Xk + ε where Y is the dependent variable (response) and X1 , . . . , Xk are the independent variables (predictors) and ε is a random error with a normal distribution having mean=0 and variance=σ 2 . The multiple linear regression algorithm computes the estimates β̂0 , β̂1 , . . . β̂k , of the regression coefficients β0 , β1 , . . . , βk , so as to minimize the sum of squares of residuals. The regression procedure Calculates the estimates of the regression coefficients, their standard errors, p-values, R2 , and the contribution of each variable to reducing the total sum of squares. Performs the Wald test on groups of specified variables. Allows control of multicollinearity criterion (default 0.05) and number of components for collinearity diagnostics to be displayed (default 8). Computes the fitted values, ANOVA table and covariance.matrix of the coefficients estimates. Computes various types of residuals-unstandardized, standardized, studentized and deleted. Computes influence statistics-Cook’s distance, DFFIT’s, covariance ratios and hat matrix diagonals. 73.4.2 Example: Multiple Linear Regression Dataset: Werner.cydx Data Description: In this example, consider the data from a blood chemistry study described by Werner, et al (1985). Eight variables were recorded for n=188 women. The data includes the information on age, weight, birth pill (1=user, 2=nonuser), cholesterol, albumin and calcium. One of the aims of this study is to find the relationship between the variable Cholesterol and other variables. Purpose of the Analysis: To fit multiple linear regression model to the data with Cholesterol as dependent variable and Age, Height, Weight, Birthpill, Albunim, Calcium, and Uric Acid as 73.4 Multiple Linear Regression – 73.4.2 Example: Multiple Linear Regression 1847 <<< Contents 73 * Index >>> Analysis-Analytics independent variables. Also to obtaion collinearity diagnostics and perform Wald test for Albunim, Calcium, and Uric Acid. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Statistics) Analytics > Multiple Linear Regression 3. In the Main tab, select Cholesterol as the Dependent Variable and select the checkboxes against the remaining 7 variables, Age, Height, Weight, Birthpill, Albunim, Calcium, and Uric Acid as independent variables. Click the Wald Test and Collinearity Diagnostics checkboxes. 4. In the Advanced tab, enter 7 as the Number of Collinearity Components. In the Wald Test box select the variables albumin, calcium, and uric acid for carrying out the Wald Test. In the Output box select all the checkboxes except 1848 73.4 Multiple Linear Regression – 73.4.2 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Hat Matrix Diagonals. 5. Click OK. You will see the Analysis results as shown below. 73.4 Multiple Linear Regression – 73.4.2 Example: Multiple Linear Regression 1849 <<< Contents 73 * Index >>> Analysis-Analytics The Terms dropped due to table refers to some essential pre-processing of the data. If a particular independent variable assumes the same value throughout the data set, it is not really a ‘variable’ and has to be dropped. Its presence creates ‘singularity’ in the so called X matrix. In the present data set we do not have the problem and hence the entry is ‘none’. Multicollinearity is another similar feature of the data which makes the 1850 73.4 Multiple Linear Regression – 73.4.2 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 problem unstable. Again, in the present dataset, no such difficulty is encountered and hence the entry ’None’. The Summary Statistics table gives an overview of the results. If the residual degrees of freedom are not adequate, we have too many independent variables. In the present case, the residual df value is 173. The data size is large, relative to the number of independent variables. Multiple R-squared indicates the fraction of the total variation explained by the selected set of independent variables. In this case, the value is 0.2523. If the data under study have high multicollinearity, estimates of regression coefficients become volatile and less dependable. To check this, examine the ‘condition numbers’. (They indicate extent of spread in eigen-values of X’X). A very large number is a warning. Montgomery et al recommend use of 100 as indicative of moderate concern while a value of 1000 is an alarm trigger (Montgomery, Peck, and Vining, 2003, page 339). For this model, the number is 114.32. Thus, it may be pertinent to take corrective step such as centering the data. The next table is ANOVA. Our data has 181 observations and total degrees of freedom are 180 (one degree being spent on fitting an intercept in the model). There are 7 independent variables giving rise to 7 df for regression and remaining 173 df are assigned to error. The very low p-value shows that the model fitted is ’significant’. We have to reject the null hypothesis that all regression coefficients are zero. Lastly, we can test if a subset of the regression coefficients is zero. A test is carried out to check if coefficients of three independent variables (Albumin, calcium and uric acid) are zero. Here also, the p-value is very small and the hypothesis stands discredited. 73.5 Multivariate Analysis of Variance This section describes various methods for analyzing a data set in which each observation consists of multiple measurements on the same experimental unit. As an example, if our study concerns the size of babies, we may measure length, chest girth, head girth and weight. In that case, there will be four measurements on each baby. We can of course study every measurement separately. However, the fact that they are correlated makes it necessary that they are studied together. All ideas applicable to analysis of univariate data are relevant here too. However, some aspects absent in univariate data arise in multivariate data. Procedures available in East for multivariate analysis include Multiple Linear Regression and Multivariate Analysis of Variance (MANOVA). References for the procedures covered in this section are provided in Johnson &Wichern (1998). Multivariate Analysis of Variance (MANOVA) procedure is a generalization of univariate Analysis of Variance (ANOVA) procedure. When we have samples of observations from different multivariate normal populations having a common variance covariance matrix, we can use the MANOVA procedure to check for the equality of mean vectors. 73.5 Multivariate Analysis of Variance – 73.5.1 Available procedures 1851 <<< Contents 73 * Index >>> Analysis-Analytics 73.5.1 Available procedures The available procedures under MANOVA are: One-way MANOVA Profile analysis 73.5.2 Example: Multivariate Analysis Dataset: Root.cydx Data Description: The following Example is taken from Rencher (1995). In a classical experiment carried out from 1918 to 1934, apple trees of different rootstocks were compared (Andrews and Herzberg, 1985). The data for eight trees from each of six rootstocks are available. The variables in the data are: y1 = trunk girth at 4 years (mm x 100) y2 = extension growth at 4 years (m) y3 = trunk girth at 15 years (mm x 100) y4 = weight of tree above ground at 15 years (lb x 1000) The table of mean vectors of the six rootstocks is shown below: Rootstock 1 2 3 4 5 6 Y1 1.1375 1.1575 1.1075 1.0975 1.08 1.0362 Y2 2.9771 3.1091 2.8152 2.8798 2.5572 2.2146 Y3 3.7388 4.515 4.455 3.9063 4.3125 3.5962 Y4 0.8711 1.2805 1.3914 1.039 1.181 0.735 Purpose of the Analysis: To perform One Way Multivariate Analysis of Variance (MANOVA) on the data with ROOTBOX as the group variable and Y1, Y2, Y3, Y4 as the response vector. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: 1852 73.5 Multivariate Analysis of Variance – 73.5.2 Example: Multivariate Analysis <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis > (Basic Statistics) Analytics > Multiple Analysis of Varience 3. Select ROOTBOX in the Group Variable. Select all the checkboxes in the Dependant Variable box. 4. Click OK. You will see the Analysis results as shown below. The p-values are very small and hence we reject the null hypothesis of equality of means as well as the hypothesis of parallel profile. 73.5 Multivariate Analysis of Variance 1853 <<< Contents * Index >>> 74 Analysis-Plots The plotting capabilities available in the Analysis menu of East are of two types. Basic Plots Crossover Plots These are essentially data exploration charts for the two types of data, case data and crossover data respectively. This chapter discusses in detail the various types of basic plots and crossover plots . . The following types of plots provide data exploration capabilities in East: Area Box Bubble Cumulative: (Left or Right) Density Histogram Simple Scatter Stem and Leaf Step Function Bar: (simple Bar, Stacked Bar, Horizontal Bar, or Stacked Horizontal Bar) Pie P-P Normal Q-Q Normal 74.1 Data Exploration Plots The plots are further classified into Categorical, Continuous and Frequency Distribution. To generate a data exploration plot, open a data file and then choose from the menu: Analysis> Basic Plots Then you can select: Categorical, Continuous, Frequency Distribution 1854 74.1 Data Exploration Plots <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 74.2 Categorial 74.2.1 Bar Chart 74.2.2 Pie chart 74.2.1 Bar Chart Bar chart provides a choice of following graphical displays of the frequencies of the categories of a variable: Simple Bar Stacked Bar Horizontal Bar Horizontal Stacked Bar. The display is in the form of vertical or horizontal bars, the height or length of the bars are proportional to the frequency of the categories shown in the X-axis. Simple Bar Dataset: Job-Case.cydx as described in Section 72.3. Purpose of Plot : For exploratory analysis and presentation, it is useful to summarize these data into a tabular form. The purpose is to generate and display a Simple Bar chart for the variable Jobsatis. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Categorical > Bar Chart > Simple Bar 3. In the ensuing dialog box, select Jobsatis as the Variable to Plot and Persons as the Frequency Variable (Optional). 74.2 Categorial – 74.2.1 Bar Chart 1855 <<< Contents 74 * Index >>> Analysis-Plots 4. Click OK. The following Simple Bar chart is displayed in the main window. Stacked Bar Dataset: Job-Case.cydx as described in Section 72.3 Purpose of Plot : To generate and display a Stacked Bar chart for the variable Incomegrp stacked by Jobsatis based on the selected dataset. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Categorical > Bar Chart > Stacked Bar 3. In the ensuing dialog box, select Incomegrp as the Category Variable, Jobsatis 1856 74.2 Categorial – 74.2.1 Bar Chart <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as the Stacked By Variable and Persons as the Frequency Variable (Optional) 4. Click OK. The following Stacked Bar chart is displayed in the main window. Horizontal Bar Dataset: Job-Case.cydx as described in Section 72.3. Purpose of Plot : To generate a Horizontal Bar chart for the variable Jobsatis. 74.2 Categorial – 74.2.1 Bar Chart 1857 <<< Contents 74 * Index >>> Analysis-Plots Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Categorical > Bar Chart > Horizontal Bar 3. In the ensuing dialog box, select Jobsatis as the X axis variable and Persons as the Frequency Variable (Optional). 4. Click OK. The following Horizontal Bar chart is displayed in the main window. 1858 74.2 Categorial – 74.2.1 Bar Chart <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Horizontal Stacked Bar Dataset: Job-Case.cydx as described in Section 72.3. Purpose of the Analysis: To generate and display a Horizontal Stacked Barchart for the variable Incomegrp stacked by Jobsatis based on the selected dataset. . Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Categorical > Bar Chart > Horizontal Stacked Bar 3. In the ensuing dialog box, Select Incomegrp as the Category Variable, Jobsatis as the Stacked By Variable and Persons as the Frequency Variable (Optional). 4. Click OK. The following Horizontal Stacked Bar chart is displayed in the 74.2 Categorial – 74.2.1 Bar Chart 1859 <<< Contents 74 * Index >>> Analysis-Plots main window. 74.2.2 Pie chart Pie provides a circle graph divided into slices, each displaying the frequency of the category of a variable. The size of each slice is proportional to the relative frequency of the values. Dataset: Socio.cydx Data Description This dataset contains measurements on 11 variables. There are 40 subjects in the study. The first 6 variables are concerned with the performance of the subject in the past while the last 5 variables reflect current performance and future plan. Purpose of Plot : To generate and display a Pie chart for the variable CourseEva. Steps: 1. Open the dataset from Samples folder of the East Installation directory. 2. Choose the menu item: Analysis > (Basic Plots) Categorical > Pie Chart 3. In the ensuing dialog box, select CourseEva as the Variable To Plot and leave 1860 74.2 Categorial – 74.2.2 Pie chart <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Summary Variable (optional) blank. 4. Click OK. The following Pie chart is displayed in the main window. 74.3 Continuous 74.3.1 74.3.2 74.3.3 74.3.4 74.3.5 Area Box Bubble Simple Scatter Normality 74.3.1 Area Area provides a graphical display of the trend of values of Y variable(s) over categories of an X variable. The display is in the form of shaded area(s) under the curve(s). 74.3 Continuous – 74.3.1 Area 1861 <<< Contents 74 * Index >>> Analysis-Plots Dataset: Myeloma.cydx as described in Section 72.1.1. Purpose of Plot : To generate and display an Area chart for the variable haemoglobin over id. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Continuous > Area 3. In the ensuing dialog box, select id as the Variable To Plot and haemoglobin as the Frequency Variable (Optional). 4. Click OK. The following Area chart is displayed in the main window. 74.3.2 Box Box provides a data display that shows the 25th and 75th percentiles of the data (using the outline of the box), the median value (the large dashed line in the box), the mean value (smaller dashed line), and the largest and smallest data points (endpoints of the 1862 74.3 Continuous – 74.3.2 Box <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 vertical line going through the box). Dataset: Myeloma.cydx as described in Section 72.1.1. Purpose of Plot : To generate and display a Box chart for the variable haemoglobin across different values of status based on the selected dataset. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Continuous > Box 3. In the ensuing dialog box, select status as the Category Axis (optional) and haemoglobin as the Variable Axis. 74.3 Continuous – 74.3.2 Box 1863 <<< Contents 74 * Index >>> Analysis-Plots 4. Click OK. The following Box chart is displayed in the main window. 74.3.3 Bubble Bubble provides an X versus Y data display that shows the number of points at a particular x, y value with proportional size bubbles, to allow the user to gauge the relative amounts of information at discrete points. Dataset: Myeloma.cydx as described in Section 72.1.1. Purpose of Plot: To generate and display a Bubble chart for status over gender based on the selected dataset. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Continuous > Bubble 3. In the ensuing dialog box, select gender as the Variable on X-Axis and status 1864 74.3 Continuous – 74.3.3 Bubble <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 as the Variable on Y-Axis variable. Leave the Frequency Variable (optional) blank. 4. Click OK. The following Bubble chart is displayed in the main window. 74.3.4 Simple Scatter Simple Scatter provides an X versus Y scatter plot. Dataset: Myeloma.cydx as described in Section 72.1.1. Purpose of Plot : To generate and display a Simple Scatter chart for lymphocytes versus age based on the selected dataset. 74.3 Continuous – 74.3.4 Simple Scatter 1865 <<< Contents 74 * Index >>> Analysis-Plots Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Continuous > Simple Scatter 3. In the ensuing dialog box, select age as the Variable on X-Axis and lymphocytes as the Variable on Y-Axis. 4. Click OK. The following Simple Scatter chart is displayed in the main window. 1866 74.3 Continuous – 74.3.5 Normality <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 74.3.5 Normality PP Normal : PP Normal provides a probability-probability (P-P) plot to see if the selected variable follows a normal distribution. The X-axis displays the observed cumulative probability and the Y-axis displays the expected cumulative probability. The plot should be approximately linear if the normal distribution is the correct model. Dataset: Socio.cydx as described in Section 74.2.2. Purpose of Plot : To generate and display a PP Normal chart for the variable FinalExam based on the selected dataset. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Continuous > Normality > PP Normal 3. In the ensuing dialog box, select FinalExam as the Variable To Plot. 74.3 Continuous – 74.3.5 Normality 1867 <<< Contents 74 * Index >>> Analysis-Plots 4. Click OK. The following PP Normal chart is displayed in the main window. QQ Normal : QQ Normal provides a quantile-quantile (Q-Q) plot to see if the selected variable follows a normal distribution. The X-axis displays the observed normal value and the Y-axis displays the expected normal value. The plot should be approximately linear if the normal distribution is the correct model. 1868 74.3 Continuous – 74.3.5 Normality <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Dataset: Socio.cydx as described in Section 74.2.2. Purpose of Plot : To generate and display a QQ Normal chart for the variable FinalExam based on the selected dataset. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Continuous > Normality > QQ Normal 3. In the ensuing dialog box, select FinalExam as the Variable To Plot. 74.3 Continuous – 74.3.5 Normality 1869 <<< Contents 74 * Index >>> Analysis-Plots 4. Click OK. The following QQ Normal chart is displayed in the main window. 74.4 Frequency Distribution 74.4.1 74.4.2 74.4.3 74.4.4 Cumulative Plot Histogram Stem and Leaf Step Function 74.4.1 Cumulative Plot Left cumulative A left cumulative frequency plot is a way to display cumulative information graphically. It shows the number of observations that are less than or equal to particular values. Dataset: Vari.cydx Data Description: A randomized clinical trial of Interferon and placebo was conducted on 44 children infected with childhood chicken pox (varicella) (Arvin, et al., 1982). One of the end points of the study was to determine whether Interferon is more effective than placebo in preventing adverse effects. The dataset has three variables Group, Category and Freq. The Group variable 1870 74.4 Frequency Distribution – 74.4.1 Cumulative Plot <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 contains values 1 and 2 specifying the two groups Interferon and Placebo, respectively. The Category variable has four categories, representing the adverse effect starting from ’none’ to ’ death in less than a week’ with values from 1 to 4 in increasing order. The number of children falling in each category, by treatment, is available in the variable Freq Purpose of Plot : To generate and display a Left Cumulative chart for Freq over the Category of adverse effects. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Frequency Distribution > Cumulative > Left Cumulative 3. In the ensuing dialog box, select Category as the Variable on X-Axis and Freq as the Variable on Y-Axis. 74.4 Frequency Distribution – 74.4.1 Cumulative Plot 1871 <<< Contents 74 * Index >>> Analysis-Plots 4. Click OK. The following Left Cumulative chart is displayed. Right cumulative Dataset: Vari.cydx as described in Section 74.4.1. Purpose of Plot : To generate and display a Right Cumulative chart for Freq over the Category of adverse effects. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Frequency Distribution > Cumulative > Right Cumulative 1872 74.4 Frequency Distribution – 74.4.1 Cumulative Plot <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. In the ensuing dialog box, select Category as the variable on X-Axis and Freq as the Variable on Y-Axis. 4. Click OK. The following Right Cumulative chart is displayed in the main 74.4 Frequency Distribution – 74.4.1 Cumulative Plot 1873 <<< Contents 74 * Index >>> Analysis-Plots window. 74.4.2 Histogram Histogram provides a graphical display of the frequencies of the consecutive values of a variable. The display is in the form of contiguous bars, the height of the bars being proportional to the frequency of the values shown on the X-axis. Dataset: Myeloma.cydx as described in Section 72.1.1. Purpose of Plot : To generate and display a Histogram chart for age based on the selected dataset. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: 1874 74.4 Frequency Distribution – 74.4.2 Histogram <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis > (Basic Plots) Frequency Distribution > Histogram 3. In the ensuing dialog box, select age as the Variable To Plot. Leave the Frequency Variable blank. 4. Click OK. The following Histogram chart is displayed in the main window. 74.4.3 Stem and Leaf Stem and Leaf provides a way to form a diagrammatic display of data using data’s number themselves. Dataset: Myeloma.cydx as described in Section 72.1.1. 74.4 Frequency Distribution – 74.4.3 Stem and Leaf 1875 <<< Contents 74 * Index >>> Analysis-Plots Purpose of Plot : To generate and display a Stem and Leaf chart for haemoglobin. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Frequency Distribution > Stem and Leaf 3. In the ensuing dialog box, select haemoglobin as the Variable To Plot and 1 as Number of Stem Splits. Enter 0 for Stem Split Size. Leave the Frequency Variable (optional) field blank. 1876 74.4 Frequency Distribution – 74.4.3 Stem and Leaf <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK. The following Stem and Leaf chart is displayed in the main window. 5. Basically it is histogram type of plot, a histogram turned on its side. It resembles right half of a leaf with the stem on the left. 74.4.4 Step Function Step Function provides a data display for a variable that changes its value at discrete intervals. Dataset: Survival.cydx Purpose of Plot: To generate and display a Step Function chart for the variable SurvPer based on the selected dataset. Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Basic Plots) Frequency Distribution > Step Function 74.4 Frequency Distribution – 74.4.4 Step Function 1877 <<< Contents 74 * Index >>> Analysis-Plots 3. In the ensuing dialog box, select TimeMth as the Variable on X-Axis and SurvPer as the Variable on Y-Axis. 4. Click OK. The following Step Function chart is displayed in the main window. 1878 74.4 Frequency Distribution – 74.4.4 Step Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 74.5 Crossover Plots In this section, we consider data obtained from a 2x2 cross over trial. We also assume that the response measured in each period of the trial has been recorded on a continuous scale. The data may either be in the form of a regular case data in bf East or a Crossover Patients Continuous Data generated using the crossover data editor. Plotting of crossover data helps in understanding the data well and also in getting an idea about the difference in treatment or period effects. Three important plots used specifically for crossover data are described here. Period− 2 Vs. Period− 1 Plot Subject Profile Plot Treatment-by-Periods Plot We will first address drawing of these plots using case data related to a 2x2 crossover trial. To generate a crossover plot, open a case data file and then choose from the menu: Analysis > (Crossover Plots) Subject Plots Then you can choose any of the three plots, Period− 2 Vs. Subject Profile Plot, Convex Hull. 74.5.1 Period− 1 Plot, Period− 2 Vs. Period− 1 Plot Period− 2 Vs. Period− 1 Plot provides a scatter plot of points for each patient where the response in period 1 is on X axis and the response in period 2 is taken on Y axis. Dataset: PEFR.cyd Data Description: Data from a single-centre, randomized, placebo-controlled, double-blind study carried out to evaluate the efficacy and safety of an inhaled drug (A) given to patients with chronic obstructive pulmonary disease on mean morning expiratory flow rate (PEFR) compared with a placebo B. In all 56 patients were involved in the study, 27 in the < AB > group who received treatment A in the first period and B in the second and 29 in the < BA > group who received treatment B in the first period and A in the second. The data are taken from Jones and Kenward (1989). Steps Open the data file PEFR.CYD. Choose the menu item: Analysis > (Crossover Plots) Subject Plots > Period 2 Vs. Period 1 Plot 74.5 Crossover Plots – 74.5.1 Period− 2 Vs. Period− 1 Plot 1879 <<< Contents 74 * Index >>> Analysis-Plots In the ensuing dialog box, select Group ID as the Group ID, Period ID as Period ID, Patient ID as Subject ID and PEFR as Response. The dialog box will now look as shown below. Click on OK. The following graph is produced. The filled points represent the means of data called ’Centroids’. The line Y=X is a line with slope 1 and intercept 0. Note that there is tendency for the plotted points to be 1880 74.5 Crossover Plots – 74.5.1 Period− 2 Vs. Period− 1 Plot <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below the line in Group< AB > and above it in Group< BA >. Thus observations on treatment A tend to be greater than those on the placebo B. This is observed in both the groups. The points for each group are quite spread out, indicating high between-patient variability. We can also see that one of the patients has a very low mean PEFR values. You can take the cursor to the lowest point in Group< AB > and read the values which are (67.778, 70.278). The fact that the points from the two groups are almost symmetrically placed in relation to the diagonal is evidence for the absence of a period effect. To determine evidence for a direct treatment effect, we will plot a combined plot for both the groups. To do this, choose Period− 2 Vs. Period− 1 Plot from the Cross Over Plots menu and select the variables. This time select the check box for Combined Groups. The dialog box will now look as shown below. 74.5 Crossover Plots – 74.5.1 Period− 2 Vs. Period− 1 Plot 1881 <<< Contents 74 * Index >>> Analysis-Plots Click on OK. The following graph is produced. Again, the filled points represent the centroids of the respective groups. The fact that the centroids are placed either side of the line with some vertical separation is evidence of a direct treatment effect. 74.5.2 Subject Profile Plot The objective of a crossover trial is to focus attention on within-patient treatment differences. A good plot for displaying these differences is the subject-profiles plot. In this plot, the change in each patient’s response over the two treatment periods is plotted for each group. To draw the Subject Profile Plot, choose the menu item, Analysis > (Crossover Plots) Subject Plots> Subject Profile In the ensuing dialog box, select Group ID as the Group ID, Period ID as Period ID, Patient ID as Subject ID and PEFR as Response. The dialog box will now 1882 74.5 Crossover Plots – 74.5.2 Subject Profile Plot <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 look as shown below. Click on OK. The following graph is produced. From the Subject Profile plot also the high between-patient variability is noticeable. The within patient changes are generally negative in Group< AB > and positive in Group< BA > with some exceptions. For Group< AB >, the slopes of lines are 74.5 Crossover Plots – 74.5.2 Subject Profile Plot 1883 <<< Contents 74 * Index >>> Analysis-Plots negative implying higher values for Period 1 where treatment A is applied. For Group< BA >, the slopes of lines are positive showing higher values for Period 2 where A is applied. Thus the general trend implies a higher value of mean PEFR for treatment A rather than for placebo B. Most of the changes are smaller in magnitude barring some large ones. 74.5.3 Treatment-by-Periods Plot Both the Period− 2 Vs. Period− 1 and Subject Profile Plots display values of Response for individual patients. To get the overall idea of the performance of both the treatments in two periods, a graph such as Treatment-by-Periods Plot is used. To draw this plot for the PEFR data: 1. Choose from the menu: Analysis > (Crossover Plots) Summary Plots> Treatment by Periods 2. In the ensuing dialog box, select Group ID as the Group ID, Period ID as Period ID, Patient ID as Subject ID and PEFR as Response. In the text boxes for Treatment 1 and Treatment 2, type A and B respectively which are the treatments used in the study. The dialog box will now look as shown below. 1884 74.5 Crossover Plots – 74.5.3 Treatment-by-Periods Plot <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. Click on OK. The following graph is produced. The points plotted are the means of the Response variable PEFR for Treatment and Period combination. As shown in the legend by the side of the plot, the lines join means for respective treatments for Period 1 and Period 2. If a cursor is taken to any of the points, it shows the label Group ID as well as the value of mean response for the corresponding treatment-by-period combination. Since no line is completely above the other, neither treatment gives higher mean response in both periods and the observed difference in means is smaller in the second period compared to the first. In Period 1, the difference is 29.847 whereas in Period 2 it is -9.041. To test whether this difference is statistically significant or not, the user is referred to the Period Effect test from the Crossover menu. All the above three plots could be drawn on log scale by checking the decision box for ’Use log(Response)’ .The response values are transformed to log(Response)where natural logarithm of Response is plotted. 4. For instance, suppose we want to draw the Period− 2 Vs. Period− 1 Plot on log scale for the PEFR data, then choose from menu: Analysis > (Crossover Plots) Subject Plots > Period 2 Vs. Period 1 Plot 5. In the ensuing dialog box, select Group ID as the Group ID, Period ID as Period ID, Patient ID as Subject ID and PEFR as Response. Check 74.5 Crossover Plots – 74.5.3 Treatment-by-Periods Plot 1885 <<< Contents 74 * Index >>> Analysis-Plots the decision box for ’Use log(Response)’ as well as for ’Combine Groups’. Click on OK. The following graph is produced. 74.5.4 Crossover Plots using Crossover Patients Continuous Data All the above graphs can be drawn on the crossover patients continuous data created by cross over data editor. To see this, 1. Open the data file, XoverPatientContinuousData.cyd available in the Samples directory of the Crossover installation directory. 2. Now choose the menu item: Analysis > (Crossover Plots) Subject Plots > Period 2 Vs. Period 1 Plot 1886 74.5 Crossover Plots – 74.5.4 Using Crossover Patients Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 You will be presented with the following dialog box. 3. Check for Combine Groups as shown in the dialog box and click OK. The following graph is produced. Note that when you are drawing these plots using Cross over Data editor, there is no need to select the variables etc, as they will be internally selected and the called plot will be drawn. For example, if you go in for the Subject Profile Plot on the same data, 74.5 Crossover Plots – 74.5.4 Using Crossover Patients Continuous Data 1887 <<< Contents 74 * Index >>> Analysis-Plots you will receive the dialog box as follows: If you check Use Log Transform of Responses, the plot will be drawn on the log scale, otherwise on the original scale of the response variable. Similarly, the dialog box you get when you attempt to draw the Treatment-by-Periods Plot will be as follows: You may specify the Treatment specifications of your choice such as Drug and Placebo, as is shown in the above dialog box. The plot will then have these 1888 74.5 Crossover Plots – 74.5.4 Using Crossover Patients Continuous Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 specifications in the legend as shown below: If suppose the treatments are not specified in the text boxes provided in the dialog box, then the plot will have the default Treatment 1 and Treatment 2 specifications in the legend. 74.5 Crossover Plots – 74.5.4 Using Crossover Patients Continuous Data 1889 <<< Contents * Index >>> 75 Analysis-Normal Superiority One-Sample This chapter demonstrates how East can be used to perform inferences on data collected from a single-sample superiority study. This may consist of a random sample of observations from either a single treatment or paired observations from two treatments. In chapter 7, the design, simulation and interim monitoring of these types of trials are discussed with reference to a Single Mean Test, a Test for the Difference of Paired Means and a Test for the Ratio of Paired Means. East supports the analysis of all of these tests as well as the Wilcoxon Signed Rank Test. They are accessible from the Analysis menu and allow the validation of whether the data supports the null or alternative hypothesis of the study. Analysis of a Single Mean Superiority Test is discussed in section 75.1, while the Two Paired Tests are discussed in section 75.2 and 75.3, respectively. Finally, the analysis of the non-parametric Wilcoxon Test is discussed in section 75.4. 75.1 Example: Single Mean Consider the problem of comparing the mean of the distribution of observations from a single random sample to a specified constant. For example, when developing a new drug for treatment of a disease, there should be evidence of efficacy. In this example, the effect of a drug on children with mental retardation and ADHD is demonstrated. For the single-sample problem, it may be desirable to compare the unknown mean response µt to a fixed value µ0 . The null hypothesis H0 : µt = µ0 is tested against either the two-sided alternative hypothesis H1 : µt 6= µ0 or a one-sided alternative hypothesis H1 : µt < µ0 or H1 : µt > µ0 . Dataset: Methylphenidate.cydx Data Description: A trial was conducted to study the effect of Methylphenidate on cognitive functioning in children with mental retardation and ADHD. For the study details, refer to Pearson et al. (2003). For the twenty four children studied, the mean number of correct responses was observed for those receiving treatment (0.60 mg/kg of Methylphenidate) as well as those on placebo. The first column of the dataset D0 displays the number of correct responses after placebo, the second column D60 shows the correct number of responses after 1890 75.1 Example: Single Mean <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 treatment (0.60 mg/kg of Methylphenidate), and the third column diff is the difference of the two measures. Purpose of the Analysis: To test whether the mean number of correct responses of children receiving treatment (0.60 mg/kg of Methylphenidate) is at least 45. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) One Sample > (Single Arm Design) Single Mean 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Click OK. You will see the Analysis results as shown below. 75.1 Example: Single Mean 1891 <<< Contents 75 * Index >>> Analysis-Normal Superiority One-Sample For this analysis, East displays the p-value associated with a left-tailed test because the observed sample mean is smaller than µ0 . The two-sided 95% confidence interval is (39.506, 49.911). The lower limit is smaller than µ0 = 45, therefore H0 : µt ≤ 45 cannot be rejected in favor of H1 : µt > 45 at one-sided 0.025 level of significance. The computed p-values also support this conclusion. 75.2 Example: Mean of Paired Differences The paired t-test is often used to compare the means of two normal distributions. Here each observation from a random sample in one distribution is matched with a unique observation from the other distribution. A common application of this is when treatments are compared by using subjects who are matched using demographic and baseline characteristics. Another application is when two separate observations are made from the same subject under different experimental conditions, which will be the focus of the next example. Dataset: Methylphenidate.cydx as described in Section 75.1 Purpose of the Analysis: To test the efficacy of Methylphenidate on cognitive functioning in children with mental retardation and ADHD. Let µ0 and µt denote the mean number of correct responses under placebo and treatment, respectively, and δ = µt − µ0 . A positive value of δ suggests efficacy of the treatment. Test the null hypothesis H0 : δ ≤ 0 against the alternative H1 : δ > 0. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) One Sample > (Paired Design) Mean of Paired 1892 75.2 Example: Mean of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Differences 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Click OK. You will see the Analysis results as shown below. The two-sided 95% confidence interval is (1.775, 8.141), which does not include 0. The one sided p-value is 0.002 which supports this conclusion. Therefore, for this 75.2 Example: Mean of Paired Differences 1893 <<< Contents 75 * Index >>> Analysis-Normal Superiority One-Sample example, it is reasonable to conclude that the use of Methylphenidate significantly increases mean number of correct responses as compared to placebo. 75.3 Example: Ratio of Paired Differences The ratio of paired differences test is used to compare the means of two log normal distributions when each observation in the random sample from one distribution is matched with a unique observation from the other distribution. As with the previous example illustrating the mean of paired differences, a common application is when two observations are made from the same subject under different experimental conditions. Another is when treatments are compared using subjects who are matched by demographic and baseline characteristics, which will be the focus of the next example. East is used to perform a log transformation on the original data, and a ratio of paired differences test on the log-transformed data. Dataset: Methylphenidate.cydx as described in Section 75.1 Purpose of the Analysis: To test the efficacy of Methylphenidate on cognitive functioning in children with mental retardation and ADHD. Define ρ = µµct . A value of ρ > 1 suggests efficacy of the treatment. Test the null hypothesis H0 : ρ = 1 against the alternative H1 : ρ > 1. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) One Sample > (Paired Design) Mean of Paired Ratios 3. In the ensuing dialog box, under the Main tab choose the variables as shown 1894 75.3 Example: Ratio of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 below. 4. Click OK. You will see the Analysis results as shown below. The observed value of test statistic is t = 2.991 and has 24 − 1 = 23 degrees of freedom. The two-sided 95% confidence interval for ln (ρ) is (0.036, 0.199), 75.3 Example: Ratio of Paired Differences 1895 <<< Contents 75 * Index >>> Analysis-Normal Superiority One-Sample does not include 0, nor does the confidence interval for ρ (1.037, 1.221) contain 1. Therefore, H0 : ρ = 1 should be rejected in favor of H1 : ρ 6= 1, and the associated p-value of 0.007 supports this conclusion. The p-value for the one sided test H0 : ρ ≤ 1 versus H1 : ρ > 1 is 0.003. Again, for this example, it is reasonable to conclude that the use of Methylphenidate significantly increases mean number of correct responses as compared to placebo. 5. Alternatively, a new log-transformed variable can be created directly in the dataset. Double click on Methylphenidate.cyd in the Library to display the data in the main window. Under the Data Editor tab, click the the Variable ribbon. icon in 6. Enter logD0 in the Target Variable field. Type LN(D0) into the empty field on the right side of the equation, or select D0 from the Variables list and LN(var) 1896 75.3 Example: Ratio of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 from the Functions list: 7. Clicking OK will add a new column labeled logD0 to the dataset. This contains log-transformed values of the entries in the D0 column. In a similar manner, create a new variable logD60 by transforming D60 and perform a paired t-test 75.3 Example: Ratio of Paired Differences 1897 <<< Contents 75 * Index >>> Analysis-Normal Superiority One-Sample using logD0 and logD60. Notice that the value of observed test statistic and the p-values are identical with those from the test for the ratio of means for paired data. In East, this test is equivalent to the paired t-test for log-transformed data. 75.4 Example: Wilcoxon Signed Rank Test The non-parametric Wilcoxon signed rank test compares the median of the difference of two paired random variables. This test is equivalent to a nonparametric version of the paired t-test, and is preferred when the distribution of data deviates from normal. Dataset: Methylphenidate.cydx as described in Section 75.1. Purpose of the Analysis: To test the null hypothesis H0 : λ ≤ 0 against the alternative H1 : λ > 0 where λ is the median value of the paired difference . A positive value for λ suggests efficacy of the treatment. Analysis Steps: 1. Open the dataset from Samples folder. 1898 75.4 Example: Wilcoxon Signed Rank Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. Choose the menu item: Analysis > (Continuous) One Sample > (Paired Design) Wilcoxon Signed Rank 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 75.4 Example: Wilcoxon Signed Rank Test 1899 <<< Contents 75 * Index >>> Analysis-Normal Superiority One-Sample 4. Click OK. You will see the Analysis results as shown below. The Estimate of Median Difference has been calculated to be 5 and the observed Standardized Statistic is 2.902 with an associated 2-sided p-value of 0.004 and one-sided p-value of 0.002. The two-sided 95% confidence interval for λ is (1.5, 8) and does not include 0. Therefore, H0 : λ ≤ 0 should be rejected in favor of H1 :λ > 0. The non-parametric Wilcoxon signed rank test also supports the reasonable conclusion that the use of Methylphenidate significantly increases mean number of correct responses as compared to placebo. 1900 75.4 Example: Wilcoxon Signed Rank Test <<< Contents * Index >>> 76 Analysis-Normal Noninferiority Paired-Sample In this chapter, we explore how we can use East to perform inference on data collected from a paired-sample noninferiority study. Two common applications of paired sample designs are: 1. Comparison of two treatments using subjects who are matched by demographic and baseline characteristics, and 2. Two observations are made from the same subject under different experimental conditions. Designing and simulation of such kind noninferiority trials are discussed in chapter 8. Analysis based on Paired Difference of Means is presented in section 76.1 and the Ratio of Paired Means is discussed in section 76.2. 76.1 Example: Mean of Paired Differences Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of an outcome variable, X, with means µt and µc , 2 . Let δ0 be the respectively, and with a standard deviation of paired difference as σD noninferiority margin. For δ0 < 0, the null hypothesis H0 : µt − µc ≤ δ0 is tested against the one-sided alternative hypothesis H1 : µt − µc > δ0 . For δ0 > 0, the null hypothesis H0 : µt − µc ≥ δ0 is tested against the one-sided alternative hypothesis H1 : µt − µc < δ0 . Dataset: Olestra.cyd Data Description: The dataset Olestra.cyd available in Samples folder contains paired observations from 28 subjects on two variables X and Y. Let µx and µy denote the population means of variables X and Y, respectively and δ = µy − µx . Purpose of the Analysis: To test the null hypothesis H0 : δ ≤ δ0 against the alternative hypothesis H1 : δ > δ0 . For this example, we consider a non-inferiority margin of δ0 = −0.5. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) One Sample > (Paired Design) Mean of Paired Differences 76.1 Example: Mean of Paired Differences 1901 <<< Contents 76 * Index >>> Analysis-Normal Noninferiority Paired-Sample 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. In the Advanced tab, enter 0.975 for Confidence Level. 5. Click OK to analyze the data. Upon completion of analysis, a new node with label Analysis: Continuous Response: Difference of Means for Paired Data1 will be added in the Library and the output will be displayed in the main 1902 76.1 Example: Mean of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 window. The observed value of test statistic is 2.822 and it has 28 − 1 = 27 degrees of freedom. The lower-limit of 1- sided 97.5% confidence interval of δ = µt − µc is -0.284. Since this is greater than the non-inferiority margin of -0.5, we can reject H1 : δ ≤ δ0 in favor of H1 : δ > δ0 at one-sided 2.5% level of significance. The p-value associated with this rejection is 0.004. 76.2 Example: Ratio of Paired Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of an outcome variable, X, with means µt and µc , respectively, and let σt2 and σc2 denote the respective variances. Let ρ0 be the noninferiority margin. With ρ0 < 1, the null hypothesis H0 : µt /µc ≤ ρ0 is tested against the one-sided alternative hypothesis H1 : µt /µc > ρ0 . With ρ0 > 1, the null hypothesis H0 : µt /µc ≥ ρ0 is tested against the one-sided alternative hypothesis H1 : µt /µc < ρ0 . Let, ρ = µt /µc . 76.2 Example: Ratio of Paired Means 1903 <<< Contents 76 * Index >>> Analysis-Normal Noninferiority Paired-Sample Dataset: Olestra.cyd as described in Section 76.1. Purpose of the Analysis: To test the null hypothesis H0 : ρ ≤ ρ0 against the alternative hypothesis H1 : ρ > ρ0 . For this illustrative example, we consider a non-inferiority margin (ρ0 ) of 0.8. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) One Sample > (Paired Design) Mean of Paired Ratios 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 1904 76.2 Example: Ratio of Paired Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. In the Advanced tab, enter 0.975 for Confidence Level. 5. Click OK to analyze the data. Upon completion of analysis following output will be displayed in the main window. 76.2 Example: Ratio of Paired Means 1905 <<< Contents 76 * Index >>> Analysis-Normal Noninferiority Paired-Sample The observed value of test statistic is 6.526 and it has 28 − 1 = 27 degrees of freedom. The lower limit of one-sided 97.5% confidence interval for ρ = µy /µx is 0.957. This is greater than the non-inferiority margin ρ0 = 0.8. Therefore, we can reject H1 : ρ ≤ 0.9 in favor of H1 : ρ > 0.8. The p-value associated with this rejection is very close to 0. 1906 76.2 Example: Ratio of Paired Means <<< Contents * Index >>> 77 Analysis-Normal Equivalence Paired-Sample This section demonstrates how East can be used to perform inference on data collected from a paired-sample equivalence study. Independent sample experimental design in some applications (e.g., bioanalytical cross-validation study) may confound statistical tests because of a possible large pooled variance that is actually due to the intersample variability, especially for incurred biological samples obtained from clinical or animal studies (Feng et. al., 2006). This problem can be overcome by applying paired sample analysis. Two common applications of paired sample designs are as follows: Comparison of two treatments using subjects who are matched by demographic and baseline characteristics. Two observations are made from the same subject under different experimental conditions. Chapter 9 deals with design, and simulation of these types of equivalence trials. The type of endpoint for paired equivalence design could be the difference of means or ratio of means. Analysis based on Paired Difference of Means as endpoint is presented in section 77.1 and the Ratio of Paired Means is discussed in section 77.2. 77.1 Example: Mean of Paired Differences Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of an outcome variable, X, with means µt and µc , 2 respectively, and with a standard deviation of paired difference as σD . Let δL and δU be the lower and upper equivalence limits respectively. We wish to test the hypothesis H0 : µt − µc ≤ δL or µt − µc ≥ δU against 77.1 Example: Mean of Paired Differences 1907 <<< Contents 77 * Index >>> Analysis-Normal Equivalence Paired-Sample H1 : δL < µt − µc < δU Dataset: FengData.cydx Data Description: Feng et al. (2006) reported the data on 12 quality control (QC) samples. Each sample were analyzed first by Lab1 and then by Lab2. The value in the columns Lab1 and Lab2 represent the measured concentration (in pg ML−1 ) by Lab1 and Lab2. Purpose of the Analysis: To ensure that comparable results can be achieved between two laboratories Lab1 and Lab2, in other words to establish statistical equivalence between the two laboratories. In this example, we consider Lab1 as the standard laboratory (C) and Lab2 as the one, which needs to be, validated (T). Denote the mean concentrations from Lab1 and Lab2 by µc and µt . Considering an equivalence limit of (−10, 10) we can state our hypothesis for test as H0 : µt − µc ≤ −10 (or) H0 : µt − µc ≥ 10 against H1 : −10 < µt − µc < 10 To reject H0 with type I error rate not exceeding 0.025. Analysis Steps: 1. Open the dataset from Samples folder. . 2. Choose the menu item: Analysis > (Continuous) One Sample > (Paired Design) Mean of Paired Differences 1908 77.1 Example: Mean of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. In the ensuing dialog box (under the Main tab) choose the variables as shown below. 4. In the Advanced tab, enter 0.975 for Confidence Level. 5. Click OK to analyze the data. Following output will be displayed in the main window. 77.1 Example: Mean of Paired Differences 1909 <<< Contents 77 * Index >>> Analysis-Normal Equivalence Paired-Sample The observed values of two test statistics are 2.39 and -6.084, and both of them have 12 − 1 = 11 degrees of freedom. The 2-sided 95% confidence interval for δ = µt − µc is (-9.553, 0.836). This confidence interval is within the equivalence interval of (-10, 10), therefore, we can reject H0 : µt − µc ≤ −10 or µt − µc ≥ 10 in favor of H1 : − 10 < µt − µc < 10 at 2.5% level of significance. 77.2 Example: Ratio of Paired Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of an outcome variable, X, with means µt and µc , and let σt2 and σc2 denote the respective variances. Here, the null hypothesis H0 : µt /µc ≤ ρL or µt /µc ≥ ρU is tested against the alternative hypothesis H1 : ρL < µt /µc < ρU . Let ρ = µt /µc denotes the ratio of two means. Then the null hypothesis can be expressed as H0 : ρ ≤ ρL or ρ ≥ ρU and the alternative can be expressed as H1 : ρL < ρ < ρU . In practice, ρL and ρU are often chosen such that ρL = 1/ρU . The two one-sided tests (TOST) procedure of Schuirmann (1987) is commonly used for this analysis, and is employed in this section for a parallel-group study. We can perform the test for difference as discussed in section 77.1 on the log-transformed data. 77.2.1 Example: Ratio of Paired Means Dataset:FengData.cyd as described in section 77.1. Purpose of the Analysis: To test H0 : µt /µc ≤ 0.85 or µt /µc ≥ 1.15 against H1 : 0.85 < µt /µc < 1.15 We want to reject H0 with probability of type I error not exceeding 0.025. 1910 77.2 Example: Ratio of Paired Means – 77.2.1 Example: Ratio of Paired Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) One Sample > (Paired Design) Mean of Paired Ratios 3. In the ensuing dialog box (under the Main tab) choose the variables as shown below. 4. In the Advanced tab, enter 0.975 for Confidence Level. 5. Click OK to analyze the data. Following output will be displayed in the main 77.2 Example: Ratio of Paired Means – 77.2.1 Example: Ratio of Paired Means 1911 <<< Contents 77 * Index >>> Analysis-Normal Equivalence Paired-Sample window. The observed values of two test statistics are 4.53 and -7.298 and both of them have 12 − 1 = 11 degrees of freedom. The 2-sided 95% confidence interval of ρ = µt /µc is (0.902, 1.01). This confidence interval is within the equivalence interval of (0.85, 1.15), therefore, we can reject H0 : µt /µc ≤ 0.85 or µt /µc ≥ 1.15 in favor of H1 : 0.85 < µt /µc < 1.15 with 2.5% level of significance. 1912 77.2 Example: Ratio of Paired Means <<< Contents * Index >>> 78 Analysis-Normal Superiority Two-Sample To demonstrate the superiority of a new treatment over the control, it is often necessary to randomize subjects to the control and treatment arms, and contrast the group-dependent means of the outcome variables. In chapter 10, designing, simulation and interim monitoring of such kind of trials are discussed in details. In this chapter, we explore how we can use East to analyze data that comes from two independent samples and crossover superiority studies. 78.1 Example: Difference of Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a normally distributed outcome variable, X, with means µt and µc , respectively, and with a common variance σ 2 . Define the treatment difference to be δ = µt − µc . The null hypothesis H0 : δ = 0 is tested against the two-sided alternative hypothesis H1 : δ 6= 0 or a one-sided alternative hypothesis H1 : δ < 0 or H1 : δ > 0. Dataset: Myeloma.cyd as described in section 72.1.1 Purpose of the Analysis: To compare the mean haemoglobin level between two groups indicated by the variable status (0-alive, 1-dead). Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Difference of Means 78.1 Example: Difference of Means 1913 <<< Contents 78 * Index >>> Analysis-Normal Superiority Two-Sample 3. In the ensuing dialog box choose the variables as shown below: 4. Click OK to analyze the data. Following output will be displayed in the main window. The observed value of test statistic is 1.559 and it has 48 + 17 − 2 or 63 degrees of 1914 78.1 Example: Difference of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 freedom. The p-value for two-sided test is 0.124. This is the p-value associated with rejecting H0 : δ = 0 in favor of alternative hypothesis H1 : δ 6= 0. The p-value for right tailed test is 0.062. This p-value is associated with the rejection of H0 : δ ≤ 0 in favor of the alternative hypothesis H1 : δ > 0. East displays the p-value associated with right tailed test on this occasion because δ̂ > 0. The two-sided 95% confidence interval is (-0.313, 2.54). Since the 2-sided confidence interval includes 0 or the p-value for two-sided test is 0.124, we cannot reject H0 : δ = 0 at 5% level of significance. 78.2 Example: Ratio of Means The statistical analysis regarding the ratio of means of two independent log-normal distributions is often of interest in biomedical research. Ratio of means as endpoint should be preferred when underlying distribution is skewed and therefore a lognormal distribution is a better fit than normal. Sometimes goal of the experiment can be better represented using ratio of means instead of their difference. Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively, and let σt2 and σc2 denote the corresponding variances of the observations. It is assumed that σt /µt = σc /µc , i.e. the coefficient of variation CV = σ/µ is the same for T and C. Define the treatment ratio to be ρ = µt /µc . The null hypothesis H0 : ρ = 1 is tested against the two-sided alternative hypothesis H1 : ρ 6= 1 or a one-sided alternative hypothesis H1 : ρ < 1 or H1 : ρ > 1. Dataset: Myeloma.cyd as described in section 72.1.1. Purpose of the Analysis: To compare the mean haemoglobin level between two groups indicated by the variable status (0-alive, 1-dead). Here, we are interested in testing the null hypothesis H0 : ρ = 1 against the alternative hypothesis H1 : ρ > 1. Since we can translate the ratio hypothesis into difference hypothesis using log transformation, East performs the test for difference on log-transformed data as discussed in section 78.1 to draw inference on ρ. 78.2 Example: Ratio of Means 1915 <<< Contents 78 * Index >>> Analysis-Normal Superiority Two-Sample Analysis Steps: 1. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Ratio of Means 2. In the ensuing dialog box (under the Main) select the variables as shown below: 3. Click OK to analyze the data. Following output will be displayed in the main 1916 78.2 Example: Ratio of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 window. The observed value of test statistic is 1.453 and it has 48 + 17 − 2 = 63 degrees of freedom. The two-sided 95% confidence interval for ln ρ is (-0.043, 0.27) and for ρ is (0.958, 1.311). The former confidence interval includes 0 and the latter includes 1. Therefore, we cannot reject H0 : ρ = 1 in favor of H1 : ρ 6= 1. The p-value for comparing H0 : ρ ≤ 1 in favor of H1 : ρ > 1 is 0.076. Therefore, we cannot reject 78.2 Example: Ratio of Means 1917 <<< Contents 78 * Index >>> Analysis-Normal Superiority Two-Sample H0 : ρ ≤ 1 against H1 : ρ > 1 either at 5% level of significance. 78.3 Example: Difference of Means in Crossover design In a 2 × 2 crossover design each subject is randomized to one of two sequence groups. Subjects in the sequence group 1 receive the test drug (T) formulation in the first period, have their outcome variable, X, recorded, wait out a washout period to ensure that the drug is cleared from their system, then receive the control drug formulation (C) in period 2 and finally have the measurement on X again. In sequence group 2, the order in which the T and C are assigned is reversed. The table below summarizes this type of trial design. Group 1(TC) 2(CT) Period 1 Test Control Washout — — Period 2 Control Test The resulting data are commonly analyzed using a statistical linear model. The response yijk in period j on subject k in sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni is modeled as a linear function of an overall mean response µ, formulation effect τt and τc , period effects π1 and π2 , and sequence effects λ1 and λ2 . The fixed effects model can be displayed as: Group 1(TC) 2(CT) Period 1 µ + τt + π1 + γ1 µ + τc + π1 + γ2 Washout — — Period 2 µ + τc + π2 + λ1 µ + τt + π2 + λ2 For superiority trial, East can test following null hypotheses: Test1: H0 : τt − τc = 0. for treatment effect Test2: H0 : π1 − π2 = 0. for period effect Test1: H0 : λ1 − λ2 = 0. For carryover effect Dataset: CrossoverCaseData.cyd Data Description: Jones and Kenward (2003) presented data from a 2 × 2 crossover trial where the primary objective was to evaluate the efficacy and safety of an inhaled drug given to patients with chronic obstructive pulmonary disease. Eligible patients were randomized to either treatment sequence AB or BA (A: Drug; B=Placebo). There was 1918 78.3 Example: Difference of Means in Crossover design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4 weeks of gap between two periods. The main comparison of efficacy was based on the mean morning peak expiratory flow rate (PEFR). The data of this trial are available in CrossoverCaseData.cyd. This dataset contains 112 observations and 7 variables. The columns GroupID, PeriodID and subjectID contain the information about group sequence, period and subject id, respectively. The column Response contains the measurements on the PEFR. Purpose of the Analysis: To test if there is significant carryover effect from period 1 to period 2. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Difference of Means 3. In the ensuing dialog box, there are two tabs in this window – Main and Advanced. In the Main tab, select different variables as shown below: 4. Click OK to start analysis. Upon completion of analysis following output will be displayed in the main window. 78.3 Example: Difference of Means in Crossover design 1919 <<< Contents 78 1920 * Index >>> Analysis-Normal Superiority Two-Sample 78.3 Example: Difference of Means in Crossover design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The observed value of test statistic is 0.948 and it has 27 + 24 − 2 = 54 degrees of freedom. The p-value for two sided test is 0.347. Therefore, the carryover effect is not significant at 5% level of significance. Let us further examine if there is a significant Treatment effect: 78.3 Example: Difference of Means in Crossover design 1921 <<< Contents 78 * Index >>> Analysis-Normal Superiority Two-Sample Dataset: CrossoverCaseData.cyd as described in Section 78.3. Purpose of the Analysis: To test if there is significant treatment effect. Analysis Steps: 1. Open the dataset. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Difference of Means 3. In the ensuing dialog box, complete the Main tab as before except for Effect Type. Select Treatment as Effect Type as we are interested in testing the treatment effect. 4. Click OK to start analysis. Upon completion of analysis, a new node with label Analysis: Continuous Response: Difference of Means test for Crossover Data2 will be added to the Library and the output will be displayed in the main window. Scroll down to the end of the output. Output for statistical test of treatment effect is displayed in the last table. 1922 78.3 Example: Difference of Means in Crossover design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The observed value of test statistic is 3.046 and it has 27 + 24 − 2 = 54 degrees of freedom. The p-value for two sided test is 0.004. Therefore, the treatment effect is significant at 5% level of significance. 78.4 Example: Ratio of Means in Crossover design In this chapter, we show how we can use East to test for ratio of means from a superiority 2 × 2 crossover trial. We have already discussed 2 × 2 crossover design in section 78.3. However, unlike section 78.3, here we are interested in ratio of means. Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively. East can test following null hypotheses: Test1: H0 : µt /µc = 1. For treatment effect Test2: H0 : π1 /π2 = 1. For period effect Test1: H0 : λ1 /λ2 = 1. For carryover effect Since we can translate the ratio hypothesis into difference hypothesis using log transformation, East performs the test for difference on log-transformed data as discussed in section 78.3. Dataset: CrossoverCaseData.cyd as described in section 78.3. Purpose of the Analysis: To test the null hypothesis H0 : ρ = 1 against the alternative hypothesis H1 : ρ 6= 1. Analysis Steps: 1. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Ratio of Means 2. In the ensuing dialog box, select different variables as shown below: 78.4 Example: Ratio of Means in Crossover design 1923 <<< Contents 78 * Index >>> Analysis-Normal Superiority Two-Sample 3. Click OK to start analysis. Upon completion of analysis, a new node with label Analysis: Continuous Response: Ratio of Means for Crossover Data1 will be added to the Library and the output will be displayed in the main window. Scroll down to the end of the output. Output for statistical test of treatment effect is displayed in the last table. East performs the analysis based on the log-transformed data. The observed value of test statistic based on log-transformed data is 2.904 and it has 27 + 24 − 2 = 54 degrees of freedom. The p-value for two sided test is 0.005. Therefore, the treatment effect is significant at 5% level of significance. Now we will perform the test for difference of means for crossover data based on log-transformed data. The CrossoverCaseData.cyd has a column labeled as LnResp which contains the log-transformed values of the entries in the Response column. The result for test of treatment effect based on LnResp as response variable (using 1924 78.4 Example: Ratio of Means in Crossover design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 difference of means for crossover data) is as follows: Compare the value of observed test statistic and the p-values with those from test for ratio of crossover means. They are identical. This is because the test for ratio of crossover means in East is equivalent to test for difference of crossover means based on log-transformed data. 78.4 Example: Ratio of Means in Crossover design 1925 <<< Contents * Index >>> 79 Analysis-Normal Noninferiority Two-Sample In a noninferiority trial, the goal is to establish that an experimental treatment is no worse than the standard treatment, rather than attempting to establish that it is superior. A therapy that is demonstrated to be non-inferior to the current standard therapy for a particular indication might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. In chapter 12, designing, simulation and interim monitoring of such kind of trials are discussed in details. In this chapter, we explore how we can use East to perform analysis of data that comes from two independent samples and crossover noninferiority studies. 79.1 Example: Difference of Means Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a normally distributed outcome variable, X, with means µt and µc , respectively, and with a common variance σ 2 . Define the treatment difference to be δ = µt − µc and δ0 be the non inferiority margin. When δ0 < 0, East tests the null hypothesis H0 : δ ≤ δ0 against the alternative hypothesis H1 : δ > δ0 . When δ0 > 0, the null hypothesis H0 : δ ≥ δ0 is tested against the alternative hypothesis H1 : δ < δ0 . Let X̄t and X̄c be the mean responses of the experimental and control groups, respectively, based on nt observations from T and nc observations from C. Then the estimate of δ is δ̂ = X̄t − X̄c . Test statistic can be defined as Z= δ̂ − δ0 se(δ̂) (79.1) where se(δ̂) is the standard error of δ̂ based on nt + nc observations. Z is distributed as variable that follows t distribution with nt + nc − 2 degrees of freedom or standard normal variate. Dataset: Werner.cyd as described in section 73.4.2. Purpose of the Analysis: The purpose here is to compare the mean cholesterol levels between the birthpill users and nonusers. Let µt and µc be the mean cholesterol level in birthpill user group and 1926 79.1 Example: Difference of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 non-user group, respectively, and δ = µt − µc . We want to test the null hypothesis H0 : δ ≥ 25 against the alternative hypothesis H1 : δ < 25. For this analysis, we consider one-sided type I error rate of 0.025. Analysis Steps: 1. To open the dataset from Samples folder 2. In case multiple workbooks are currently open, then this will bring up the Keep in dialog box. You can select either one of the existing workbooks or you can create new workbook. Suppose you want to create a new workbook labeled as “Birthpill Noninferiority”. In order to do this, select the radio button New Workbook and type in Birthpill Noninferiority in the field next to it. 3. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Difference of Means 4. In the ensuing dialog box (under the Main) tab select Noninferiority as Trial Type, Equal as Variance Type and t-test as Test Type. Select Birthpill as Population Id variable. As you select variable for Population Id field, a new box will appear below where you have to specify the levels of the Population Id variable for control and treatment group. Choose 0 for Control. By doing this, East will treat the subjects with BIRTHPILL=0 as they are in the control group and remaining subjects in the treatment group. Select Response Variable as CHOLESTEROL and enter 25 for Noninferiority Margin. Leave the Frequency Variable field blank. 79.1 Example: Difference of Means 1927 <<< Contents 79 * Index >>> Analysis-Normal Noninferiority Two-Sample 5. In the Advanced tab, enter 0.975 for Confidence Level. 6. Click OK to analyze the data. Following output will be displayed in the main window. There are 94 observations in each group. The mean (standard deviation) cholesterol levels are 232.97 (43.492) abd 240.59 (58.924) in birthpill non-user and user groups, respectively. Estimated treatment difference is δ̂ = 7.617 with se(δ̂) = 7.554 The 1928 79.1 Example: Difference of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 effect size is −0.336. p This can be verified by plugging the value of δ̂ = 7.617, δ0 = 25 and σ̂ = 7.554/ 1/94 + 1/94 = 51.788 in the following formula of effect size δ̂ − δ0 σ̂ The observed value of test statistic is −2.301 (see eq. 79.1) and it has 94 + 94 − 2 or 186 degrees of freedom. The p-value for one-sided test is 0.011. This is the p-value associated with rejecting H0 : δ ≥ 25 in favor of alternative hypothesis H1 : δ < 25. The one-sided 97.5% confidence interval is (−∞, 22.519). Since the upper limit of the confidence interval is smaller than the noninferiority margin of 25, we can reject H0 : δ ≥ 25 at one-sided 2.5% level of significance. 79.2 Example: Ratio of Means The statistical analysis regarding the ratio of means of two independent log-normal distributions is often of interest in biomedical research. Ratio of means as endpoint should be preferred when underlying distribution is skewed and therefore a lognormal distribution is a better fit than normal. Sometimes goal of the experiment can be better represented using ratio of means instead of their difference. Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively, and let σt2 and σc2 denote the corresponding variances of the observations. It is assumed that σt /µt = σc /µc , i.e. the coefficient of variation CV = σ/µ is the same for T and C. Finally, let ρ = µt /µc . Let ρ0 be the noninferiority margin. For ρ0 < 1, East tests the null hypothesis H0 : ρ ≤ ρ0 against the alternative hypothesis H1 : ρ > ρ0 . When ρ0 > 1, the null hypothesis H0 : ρ ≥ ρ0 is tested against the alternative hypothesis H1 : ρ < ρ0 . Since we can translate the ratio hypothesis into difference hypothesis using log transformation, East performs the test for difference on log-transformed data as discussed in section 79.1 to draw inference on ρ. Dataset: We will again use Werner.cyd dataset as described in section 73.4.2. Purpose of the Analysis: Let µt and µc be the mean cholesterol level in birthpill user and nonuser groups, respectively, and ρ = µt /µc . Here, we are interested in testing the null hypothesis H0 : ρ ≥ 1.10 is tested against the alternative hypothesis H1 : ρ < 1.10. For this analysis, we consider one-sided type I error rate of 0.05. 79.2 Example: Ratio of Means 1929 <<< Contents 79 * Index >>> Analysis-Normal Noninferiority Two-Sample Analysis Steps: 1. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Ratio of Means 2. If the dataset is not displayed in your main window, this will bring up the Select Dataset dialog box with the list of available workbooks and datasets available under each workbook. If the dataset is already displayed in your main window, East will skip this step and the dataset in the main window will be used in the analysis. In case East brings up the Select Dataset dialog box, choose Werner.cyd dataset under workbook BirthpillNon inferiority and click OK. 3. In the ensuing dialog box (under the Main) select the variables as shown below: 1930 79.2 Example: Ratio of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Under the Advanced tab enter 0.975 for Confidence Level. 5. Click OK to analyze the data. Following output will be displayed in the main 79.2 Example: Ratio of Means 1931 <<< Contents 79 * Index >>> Analysis-Normal Noninferiority Two-Sample window. In the Output section, the first part provides descriptive statistics for the two groups. The second table labeled with Test of Hypothesis for:ln(CHOLESTEROL) provides details about the test result. Note the word “ln(CHOLESTEROL)”; this emphasize that 1932 79.2 Example: Ratio of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the analysis is performed on log-transformed data. In this table, we see the Difference of Means as 0.021. This is the estimated treatment difference in terms of log-transformed data on CHOLESTEROL. The estimated effect size is -0.336. The observed value of test statistic is -2.3 and it has 94 + 94 − 2 = 186 degrees of freedom. The one-sided 97.5% confidence interval for ln ρ is (−∞, 0.085) and for ρ is (0, 1.088). The upper limit of one-sided 97.5% confidence interval for ρ is smaller than the noninferiority margin ρ0 = 1.10. Therefore, we reject H0 : ρ ≥ 1.10 in favor of H1 : ρ < 1.10 at one-sided 0.025 level of significance. The p-value associated with this rejection is 0.011. 79.3 Example: Difference of Means in Crossover design In a 2 × 2 crossover design each subject is randomized to one of two sequence groups. Subjects in the sequence group 1 receive the test drug (T) formulation in a first period, have their outcome variable, X recorded, wait out a washout period to ensure that the drug is cleared from their system, then receive the control drug formulation (C) in period 2 and finally have the measurement on X again. In sequence group 2, the order in which the T and C are assigned is reversed. The table below summarizes this type of trial design. Group 1(TC) 2(CT) Period 1 Test Control Washout — — Period 2 Control Test The resulting data are commonly analyzed using a statistical linear model. The response yijk in period j on subject k in sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni is modeled as a linear function of an overall mean response µ, formulation effect τt and τc , period effects π1 and π2 , and sequence effects λ1 and λ2 . The fixed effects model can be displayed as: Group 1(TC) 2(CT) Period 1 µ + τt + π1 + γ1 µ + τc + π1 + γ2 Washout — — Period 2 µ + τc + π2 + λ1 µ + τt + π2 + λ2 Let µt = µ + τt and µc = µ + τc . For noninferiority crossover trial, East tests only for treatment effect. With δ0 as noninferiority margin, East tests H0 : µt − µc ≤ δ0 when δ0 < 0 and H0 : µt − µc ≥ δ0 when δ0 > 0. 79.3 Example: Difference of Means in Cross-over design 1933 <<< Contents 79 * Index >>> Analysis-Normal Noninferiority Two-Sample In East we use following test statistic to test the above null hypothesis: TL = (ȳ11 − ȳ12 − ȳ21 + ȳ22 )/2 − δ0 q σ̂ 2 1 1 2 ( n1 + n2 ) where, ȳij is the mean of the observations from group i and period j and σ̂ 2 is the estimate of error variance (obtained as mean-squared error from ANOVA including period, treatment and sequence as source of variation in the model). Tτ is distributed with Student’s t distribution with (n1 + n2 − 2) degrees of freedom. Dataset: pkfood.cyd Data Description: Here we will use pharmacokinetic data from 2 × 2 crossover trial available in pkfood.cyd. The dataset consists of observations from 20 subjects on AU C, Cmax and Tmax evaluated under two regimens A and B. For this example, we will consider regimen B as reference and regimen A as test drug and AUC as response variable. Purpose of the Analysis: Let µc and µt denote the mean AUC in regimen B and regimen A, respectively and δ = µt − µc . We are interested in testing H0 : µt − µc ≤ δ0 against H1 : µt − µc > δ0 . Here we set the noninferiority margin, δ0 as −5000. For this analysis, one-sided type I error of 0.025 is considered. Analysis Steps: 1. Choose the menu item: Home > Open > Data to open the dataset from Samples folder. 2. In case multiple workbooks are currently open, then this will bring up the Keep in dialog box. You can select either one of the existing workbooks or you can create new workbook. Suppose you want to create a new workbook labeled as “Crossover noninferiority”. In order to do this, select the radio button New Workbook and type in Crossover noninferiority in the field next to it. Click OK. This will open the pkfood.cyd dataset in the main window of under the Data Editor. 3. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Difference of Means 1934 79.3 Example: Difference of Means in Cross-over design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. In the ensuing dialog box (under the Main) select/enter the different variables as shown below. 5. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank and enter 0.975 for Confidence Level. 6. Click OK to analyze the data. Following output will be displayed in the main 79.3 Example: Difference of Means in Cross-over design 1935 <<< Contents 79 * Index >>> Analysis-Normal Noninferiority Two-Sample window. 1936 79.3 Example: Difference of Means in Cross-over design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 79.3 Example: Difference of Means in Cross-over design 1937 <<< Contents 79 * Index >>> Analysis-Normal Noninferiority Two-Sample In the Output section, the first part provides descriptive statistics for the two groups. The second table provides the treatment summary. The third table labeled as Test of Hypothesis for provides results for statistical test of treatment effect. The estimated effect size is 0.95. The observed value of test statistic is 4.248 and it has 10 + 10 − 2 = 18 degrees of freedom. The p-value for one-sided test is 0. This is the p-value associated with rejecting H0 : δ ≤ −5000 in favor of alternative hypothesis H1 : δ > −5000. The one-sided 97.5% confidence interval is (−3304.769, −∞). Since the lower limit of the confidence interval is greater than the noninferiority margin of -5000, we can reject H0 : δ ≤ −5000 at one-sided 2.5% level of significance. 79.4 Example: Ratio of Means in Crossover design In this chapter, we show how we can use East to test for ratio of means from a noninferiority 2 × 2 crossover trial. We have already discussed 2 × 2 crossover design in section 79.3. However, unlike section 79.3, here we are interested in ratio of means. Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C), respectively. For noninferiority trial, East tests only for treatment effect. With ρ0 as noninferiority margin, East tests H0 : µt /µc ≤ ρ0 when ρ0 < 1 and H0 : µt /µc ≥ ρ0 when ρ0 > 1. Since we can translate the ratio hypothesis into difference hypothesis using log transformation, East performs the test for difference on log-transformed data as discussed in section 79.3. 1938 79.4 Example: Ratio of Means in Crossover design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Dataset: We will again use pkfood.cyd dataset as described in section 79.3. Purpose of the Analysis: Here, we are interested in testing the null hypothesis H0 : ρ ≤ 0.8 is tested against the alternative hypothesis H1 : ρ > 0.8. For this analysis, we consider one-sided type I error of 0.025. 1. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Ratio of Means 2. If the dataset is not displayed in your main window, this will bring up the Select Dataset dialog box with the list of available workbooks and datasets available under each workbook. If the dataset is already displayed in your main window, East will skip this step and the dataset in the main window will be used in the analysis. In case East brings up the Select Dataset dialog box, choose pkfood.cyd dataset under workbook Crossover noninferiority and click OK. 3. In the ensuing dialog box (under the Main) select the variables as shown below: 4. In Advanced tab specify confidence interval as 0.975. Click OK and the output will be displayed in the main window. Scroll down to the end of the output. 79.4 Example: Ratio of Means in Crossover design 1939 <<< Contents 79 * Index >>> Analysis-Normal Noninferiority Two-Sample Output for statistical test of treatment effect is displayed in the last table. East performs the analysis based on the log-transformed data. The observed value of test statistic based on log-transformed data is 0.561 and it has 10 + 10 − 2 = 18 degrees of freedom. The p-value associated with rejection H0 : ρ ≤ 0.8 is 0.291. The one-sided 97.5% confidence interval for ρ is (0.708, ∞). Since the lower limit of the confidence interval is smaller than the noninferiority margin of 0.8, we cannot reject H0 : ρ ≤ 0.8 at one-sided 2.5% level of significance. 1940 79.4 Example: Ratio of Means in Crossover design <<< Contents * Index >>> 80 Analysis-Normal Equivalence Two-Sample In many cases, the goal of a clinical trial is neither superiority nor non-inferiority, but equivalence. Chapter 13 deals with the design and simulation of these types of trials. This chapter explains how we can use East to perform analysis of data that comes from two independent samples and crossover equivalence studies. 80.1 Example: Difference of Means Dataset: Iris.cyd Data Description: Iris flower dataset (Fisher, 1936) consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured for each sample: the length and the width of the sepals and petals, in centimeters. In this example we will consider sepal widths from I. verginica and I. versicolor respectively. The purpose here is to compare the mean sepal widths between I. verginica and I. versicolor. Purpose of the Analysis: Let µt and µc be the mean sepal widths in I. verginica and I. versicolor, and δ = µt − µc . We want to test the null hypothesis H0 : δ ≤ −5 or δ ≥ 5 against the alternative hypothesis H1 : − 5 < δ < 5. We want to reject H0 with probability of type I error not exceeding 0.025. Analysis Steps: 1. Open the dataset from Samples folder. 2. In case multiple workbooks are currently open, then this will bring up the Keep in dialog box. You can select either one of the existing workbooks or you can create new workbook. Suppose you want to create a new workbook labeled as “Iris Equivalence”. In order to do this, select the radio button New Workbook 80.1 Example: Difference of Means 1941 <<< Contents 80 * Index >>> Analysis-Normal Equivalence Two-Sample and type in Iris Equivalencein the field next to it. 3. Click OK. The Iris.cyd dataset opens in the main window under the Data Editor menu. The dataset has observation from 150 subjects from the 3 species. The columns Species na and Sepal widt contains the information on name of species and width of sepals. We are considering I. verginica and I. versicolor only in this example. Therefore, we need to keep the data only from these two datasets and remove the remaining observations. 4. Under the Data Editor menu, click 1942 80.1 Example: Difference of Means icon in the Data ribbon. This shows <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the Filter case(s) dialog box. 5. Click If condition button and enter Species No > 1. 6. You can also use Build Expression to formulate a conditional expression for the IF ( ) field instead of directly writing the expression. Click OK. The observations pertaining to species Setosa are highlighted. Select these icon under the Data Editor menu. highlighted observations and click The dataset will now have only 100 observations pertaining to I. verginica and I. versicolor. 7. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Difference of 80.1 Example: Difference of Means 1943 <<< Contents 80 * Index >>> Analysis-Normal Equivalence Two-Sample Means 8. In the Main tab, select Equivalence as Trial Type, Equal as Variance Type and t-test as Test Type. For the Population Id field you have to choose a dichotomous variable. The variable selected in this field is the population identifier. Select Species na as Population Id variable. As you select variable for Population Id field, a new box will appear below where you have to specify the levels of the Population Id variable for control and treatment group. Choose Versicolor for Control. East will treat the Versicolor as control and Verginica the treatment. Select sepal widt as Response Variable and enter −5 and 5 for Lower Equiv. Limit and Upper Equiv. Limit. The Frequency Variable allows the user to specify a variable that represents a frequency, or weighted value. For the current example, leave the Frequency Variable field blank. 9. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Enter 0.975 for Confidence Level. 10. Click OK to start the analysis. Upon completion of analysis, a new node with the label Analysis: Continuous Response: Difference of Means for Independent Data will be added in the Library and the output is displayed in 1944 80.1 Example: Difference of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the main window. The result of the analysis is divided in three sections. The Hypothesis section states the null and alternative hypothesis for 2-sided and 1-sided tests. The Input Parameters section displays the name of the data file, the response variable, type of test performed, type I error set for the analysis and other parameter(s) used in the analysis. It is important to review this section to ensure correct and complete input parameters are specified. The last section is Output. First part of the output section is about the descriptive statistics about the response variable. There are 50 observations in each group. Mean sepal lengths (standard deviation) are 27.64 (3.141) and 29.74 (3.225) in I. versicolor and I. verginica groups. Estimated treatment difference is δ̂ = 2.1 with se(δ̂) = 0.637. There are two effect sizes - 2.23 (under H01 ) and −0.911 (under H02 ). These values can be verified p by plugging the value of δ̂ = 2.1, δL = −5, δU = 5 and σ̂ = 0.637/ 1/50 + 1/50 = 3.185 in the following formula of effect size under H01 and H02 . 80.1 Example: Difference of Means 1945 <<< Contents 80 * Index >>> Analysis-Normal Equivalence Two-Sample δ̂ − δL δ̂ − δU and σ̂ σ̂ The observed value of two test statistics are 11.152 and -4.555 and both of them have 50 + 50 − 2 = 98 degrees of freedom. The two-sided 95% confidence interval of δ = µt − µc is (0.837, 3.363). This confidence interval is within the equivalence interval of (-5, 5), therefore, we can reject H0 : µt − µc ≤ −5 or µt − µc ≥ 5 in favor of H1 : − 5 < µt − µc < 5 with 5% level of significance. 80.2 Example: Log-ratio of Means Dataset: We will again use dataset Iris.cyd here. Data Description: Description of this dataset is given in subsection 80.1. Purpose of the Analysis: Let µt and µc be the mean sepal widths of I. verginica and I. versicolor, and ρ = µt /µc . Here, we are interested in testing the null hypothesis H0 : ρ ≤ 0.8 or ρ ≥ 1.25 is tested against the alternative hypothesis H1 : 0.8 < ρ < 1.25. We want to reject H0 with type I error not exceeding 0.025. Analysis Steps: 1. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Ratio of Means 2. If the dataset is not displayed in your main window, this will bring up the Select Dataset dialog box with the list of available workbooks and datasets available under each workbook. If the dataset is already displayed in your main window, East will skip this step and the dataset in the main window will be used in the analysis. In case East brings up the Select Dataset dialog box, choose Iris.cyd 1946 80.2 Example: Log-ratio of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 dataset under workbook Iris Equivalence and click OK. 3. In the ensuing dialog box (under the Main tab) select the variables as shown below: 4. In the Advanced tab, leave By Variable 1 and By Variable 2 blank and enter 80.2 Example: Log-ratio of Means 1947 <<< Contents 80 * Index >>> Analysis-Normal Equivalence Two-Sample 0.975 for Confidence Level. 5. Click OK to start the analysis. Upon completion of the analysis, following 1948 80.2 Example: Log-ratio of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 output is displayed in the main window. First two sections display the information about the hypothesis tested and the inputs specified. In the Output section, the first part provides descriptive statistics for the two groups. The second table labeled with Test of Hypothesis for:ln(Sepal widt) provides details about the test result. Note the word “ln(Sepal widt)”; this emphasizes that the analysis is performed on log-transformed data. In this table, the Difference of Means is 0.074. This is the estimated treatment difference in terms of log-transformed data on Sepal widt. In this example, the two effects sizes are 2.636 and -1.322. The observed value of two test statistics are 13.181 and -6.611 and both of them have 98 degrees of 80.2 Example: Log-ratio of Means 1949 <<< Contents 80 * Index >>> Analysis-Normal Equivalence Two-Sample freedom. The two-sided 95% confidence interval of ρ = µt /µc is (1.03, 1.126). This confidence interval is within the equivalence interval of (0.80, 1.25), therefore, we can reject H0 : µt /µc ≤ 0.80 or µt /µc ≥ 1.25 in favor of H1 : 0.80 < µt /µc < 1.25 with 5% level of significance.. 80.3 Example: Difference of Means in Crossover Designs Crossover trials are widely used in clinical and medical research and in other diversified areas such as veterinary science, psychology, sports science, dairy science, and agriculture. Crossover design is often preferred over parallel design because each subject receives all the treatments and thus each subject acts as their own control. In this section, we show how East supports the design and simulation of such experiments with endpoint as difference of means. Dataset: pkfood.cyd Data Description: Here we will use pharmacokinetic data from 2 × 2 crossover trial available in pkfood.cyd. The dataset consists of observations from 20 subjects on AU C, Cmax and Tmax evaluated under two regimens A and B. For this example, we will consider regimen B as reference and regimen A as test drug and AUC as response variable. Purpose of the Analysis: Let µc and µt denote the mean AUC in regimen B and regimen A and δ = µt − µc . Here we set the bioequivalence limits (δL , δU ) as (−5000, 5000). We are interested in testing H0 : δ ≤ −5000 or δ ≥ 5000 against H1 : − 5000 < δ < 5000. For this analysis, probability of type I error of 0.05 is considered. Analysis Steps: 1. Open the dataset from Samples folder. 2. In case multiple workbooks are currently open, then this will bring up the Keep in dialog box. You can select either one of the existing workbooks or you can create new workbook. Suppose you want to create a new workbook labeled as “Crossover Equivalence”. In order to do this, select the radio button New Workbook and type in Crossover Equivalence in the field next to it. Click OK. This will open the pkfood.cyd dataset in the main window of under the Data Editor. 3. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Difference of Means 1950 80.3 Example: Difference of Means in Crossover Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. In the ensuing dialog box (under the Main tab) select/enter the different variables as shown below. 5. In the Advanced tab, leave By Variable 1 and By Variable 2 blank and enter 0.95 for Confidence Level. 6. Click OK to analyze the data. Following output will be displayed in the main 80.3 Example: Difference of Means in Crossover Designs 1951 <<< Contents 80 * Index >>> Analysis-Normal Equivalence Two-Sample window. 1952 80.3 Example: Difference of Means in Crossover Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the Output section, the first part provides descriptive statistics for the two groups. The second table provides the treatment summary. The third table labeled as Test of Hypothesis for provides results for statistical test of treatment effect. The observed values of two test statistics are 4.248 and -8.417 and both of them have 18 degrees of freedom. The 2-sided 90% confidence interval of δ = µt − µc is (-3015, -277). This confidence interval is well within the equivalence interval of (-5000, 5000), therefore, we can reject H0 : µt − µc ≤ −5000 or µt − µc ≥ 5000 in favor of H1 : − 5000 < µt − µc < 5000 with 5% level of significance. 80.4 Example: Ratio of Means in Crossover Designs Often in crossover designs, equivalence hypothesis is tested in terms of ratio of means. This type of trial is very popular in establishing bioequivalence and bioavailability between two formulations in terms of pharmacokinetic parameters (FDA guideline on BA/BE studies for orally administered drug products, 2003). In particular, FDA considers two products bioequivalent if the 90% confidence interval of the ratio of two means lie within (0.8, 1.25). This chapter, shows how East is used to analyze data from such experiments with endpoint as ratio of means. Since the ratio hypothesis is translated into difference hypothesis using log transformation, East performs two one sided tests (TOST) on the log-transformed data as discussed in section 80.3. Dataset: We will again use pkfood.cyd dataset here. Data Description: Description of this dataset is given in subsection 80.3. Purpose of the Analysis: Here we are interested in ratio of means. Let µt and µc denote the means of the observations from the experimental treatment (T) and the control treatment (C). In equivalence trial with endpoint as ratio of means, the goal is to establish ρL < ρ < ρU , where ρL and ρU are specified values used to define equivalence. In practice, ρL and ρU are often chosen such that ρL = 1/ρU The null hypothesis H0 : ρ ≤ ρL or ρ ≥ ρU is tested against the two-sided alternative 80.4 Example: Ratio of Means in Crossover Designs 1953 <<< Contents 80 * Index >>> Analysis-Normal Equivalence Two-Sample hypothesis H1 : ρL < ρ < ρU at level α, using two one-sided tests. Schuirmann (1987) proposed working this problem out on the natural logarithm scale. Thus we are interested in the parameter δ = ln(ρ) = ln(µt ) − ln(µc ) and the null hypothesis H0 : δ ≤ δL or δ ≥ δU is tested against the 2-sided alternative hypothesis H1 : δL < δ < δU at level α, using two one-sided t-tests. Here δL = ln(ρL ) and δU = ln(ρU ). Here, we are interested in testing the null hypothesis H0 : ρ ≤ 0.8 or ρ ≥ 1.25 against the alternative hypothesis H1 : 0.8 < ρ < 1.25. For this analysis, consider type I error rate of 0.05. Analysis Steps: 1. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Ratio of Means 2. If the dataset is not displayed in your main window, this will bring up the Select Dataset dialog box with the list of available workbooks and datasets available under each workbook. If the dataset is already displayed in your main window, East will skip this step and the dataset in the main window will be used in the analysis. In case East brings up the Select Dataset dialog box, choose pkfood.cyd dataset under workbook Crossover Equivalence and click OK. 3. In the ensuing dialog box (under the Main tab) select/enter the variables as shown below: 4. In the Advanced tab specify confidence interval as 0.95. 5. Click OK to start the analysis. Upon completion of the analysis, a new node with label Analysis: Continuous Response: Ratio of Means for Crossover Data1 is added to the Library and the output is displayed in the main window. Scroll down to the end of the output. Output for statistical test of treatment effect 1954 80.4 Example: Ratio of Means in Crossover Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 is displayed in the last two tables. East performs the analysis based on the log-transformed data. The observed values of test statistics based on log-transformed data are 0.561 and -5.086 and they are distributed with 10 + 10 − 2 = 18 degrees of freedom. The 2-sided 95% confidence interval of ρ = µt /µc is (0.729, 0.959). This confidence interval is NOT within the equivalence interval of (0.80, 1.25), therefore, we can reject H0 : µt /µc < 0.80 or µt /µc > 1.25 in favor of H1 : 0.80 ≤ µt /µc ≤ 1.25 NOT at 5% level of significance. 80.4 Example: Ratio of Means in Crossover Designs 1955 <<< Contents * Index >>> 81 Analysis-Nonparametric Two-Sample The Wilcoxon-Mann-Whitney nonparametric test is commonly used for the comparison of two distributions when the observations cannot be assumed to come from normal distributions. It is used when the distributions differ only in a location parameter and is especially useful when the distributions are not symmetric. East supports analysis using Wilcoxon-Mann-Whitney nonparametric test for parallel as well as crossover designs. The former is discussed in Section 81.1, 81.2 and 81.3and the later in Section 81.4, 81.5 and 81.6. 81.1 Test for Superiority 81.1.1 Example Let X1 , . . . , Xnt be the nt observations from the treatment (T ) with distribution function Ft and Y1 , . . . , Ync be the nc observations from the control (C) with distribution function Fc . Ft and Fc are assumed to be continuous with corresponding densities ft and fc , respectively. The primary objective in Wilcoxon-Mann-Whitney test is to investigate whether there is a shift of location, which indicates the presence of the treatment effect. Let θ represents the treatment effect. That is, Ft (z) = Fc (z + θ). In a superiority trial, we test the null hypothesis H0 : θ = 0 against the two-sided alternative H1 : θ 6= 0 or a one-sided alternative hypothesis H1 : θ < 0 or H1 : θ > 0. The test statistic is the sum of the ranks for the treatment in the pooled sample minus nt (nt + 1)/2 or equivalently the number of pairs (Xi , Yj ) such that Xi < Yj . Usually, the test statistic is denoted by W . Asymptotically, this is distributed with following mean and variance E(W ) = nt (nt + nc + 1) 2 var(W ) = nt nc (nt + nc + 1) 12 The standardized test statistic, Z, is obtained as W − E(W ) Z= p var(W ) The p-value is calculated assuming Z is distributed as standard normal variate. 1956 81.1 Test for Superiority <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 81.1.1 Example Dataset:Myeloma.cyd as described in Section 72.1.1. Purpose of the Analysis: The purpose here is to compare the values of the variable haemoglobin level between two groups indicated by the variable status (0-alive, 1-dead). Let θ be the median difference between the two groups. We will use θt and θc to denote the median haemoglobin in the alive and dead groups, respectively. We are interested in testing the null hypothesis H0 : θ = 0 with type I error not exceeding 5% level of significance, where θ = θt − θc . Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Wilcoxon-Mann-Whitney 3. In the ensuing dialog box choose the variables as shown below: 4. Click OK to start analysis. The output will be displayed in the main window 81.1 Test for Superiority – 81.1.1 Example 1957 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample now. The last section is the Output. First part of the output is about the descriptive statistics about the response variable. There are 65 observations. The mean (standard deviation) hemoglobin levels are 9.91 (2.564) and 11.024 (2.425) in control and treatment groups, respectively. Estimated median difference between the two groups is 1.2. The observed test statistic is W=672. The value of standardized statistic is 1.658 and this is obtained according to Eq. 81.1. The 2-sided p-value for comparison of two groups is 0.097. We conclude that based on Wilcoxon Mann Whitney test, the medians in two groups are not significantly different at 5% significance level. 1958 81.1 Test for Superiority – 81.1.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 81.2 Test for Noninferiority 81.2.1 Example As before, we assume that X1 , . . . , Xnt be the nt observations from the treatment (T ) with distribution function Ft and Y1 , . . . , Ync be the nc observations from the control (C) with distribution function Fc . Ft and Fc are assumed to be continuous with corresponding densities ft and fc , respectively. Let θ be the shift of location such that, Ft (z) = Fc (z + θ). In a non-inferiority trial, we test the null hypothesis H0 : θ ≤ δ0 against the alternative hypothesis H1 : θ > δ0 if δ0 < 0 or H0 : θ ≥ δ0 against the alternative hypothesis H1 : θ < δ0 if δ0 > 0. East first subtracts δ0 from X1 , . . . , Xnt and then the value of test statistic, standardized test statistic and p-value are calculated as done in superiority trial. 81.2.1 Example Dataset: Werner.cyd as described in Section 73.4.2 Purpose of the Analysis: The purpose here is to compare the median cholesterol level in birthpill user group (T ) with the non-user group (C) with non-inferiority margin (δ0 ) of 25 and one-sided type I error of 0.025. Let θt and θc be the median cholesterol levels in birthpill user and non-user groups, respectively. Since δ0 = 25 > 0, we are testing H0 : θ ≥ δ0 against the alternative hypothesis H1 : θ < δ0 , where θ = θt − θc . Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Wilcoxon-Mann-Whitney 81.2 Test for Non-inferiority – 81.2.1 Example 1959 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample 3. In the ensuing dialog box choose the variables as shown below: 4. Now, click on Advanced tab and enter 0.975 for Confidence Level. 5. Click OK to start analysis. The output will be displayed in the main window 1960 81.2 Test for Non-inferiority – 81.2.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 now. The last section is the Output. First part of the output is about the descriptive statistics about the response variable. There are 25 observations. Estimate of θ = θt − θc (i.e., median difference) is 5. The observed value of the test statistic (W) and standardized test statistic (Z) are 7781.5 and -2.953, respectively. The p-value for this non-inferiority test is 0.002. Therefore, we conclude that the Birthpill user group is non-inferior to the non-user group in terms of cholesterol level with non-inferiority margin of 25. 81.3 Test for Equivalence 81.3.1 Example As before, we assume that X1 , . . . , Xnt be the nt observations from the treatment (T ) with distribution function Ft and Y1 , . . . , Ync be the nc observations from the control (C) with distribution function Fc . Ft and Fc are assumed to be continuous with 81.3 Test for Equivalence 1961 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample corresponding densities ft and fc , respectively. Let θ be the shift of location such that, Ft (z) = Fc (z + θ). The null hypothesis H0 : θ ≤ δL or θ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < θ < δU at level α, using the following two one-sided tests (TOST). Test1: H0L : θ ≤ δL against H1L : θ > δL at level α Test2: H0U : θ ≥ δU against H1U : θ < δU at level α East subtracts δL and δU from X1 , . . . , Xnt for Test1 and Test2, respectively. Then the value of test statistic, standardized test statistic and p-value are calculated separately as done in superiority trial. To declare equivalence, both H0L and H0U need to be rejected. 81.3.1 Example Dataset: Iris.cyd as described in Section 80.1 Purpose of the Analysis: The purpose here is to compare the median sepal widths between I. verginica and I. versicolor with equivalence limits (δL , δU ) as (-5, 5). Let θt and θc denote the median sepal widths in I. verginica and I. versicolor, respectively, and θ = θt − θc . We want to test the null hypothesis H0 : θ ≤ −5 or θ ≥ 5 against the alternative hypothesis H1 : − 5 < θ < 5. We want to reject H0 with type I error rate not exceeding 0.05. Analysis Steps: 1. Open the Iris.cyd from the Samples folder and keep only the observations pertaining to I. verginica and I. versicolor as described in subsection 80.1. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Parallel Design) Wilcoxon-Mann-Whitney 1962 81.3 Test for Equivalence – 81.3.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. In the ensuing dialog box choose the variables as shown below: 4. Now click on Advanced tab. Enter 0.975 for Confidence Level. 5. Click OK to start analysis. The output will be displayed in the main window 81.3 Test for Equivalence – 81.3.1 Example 1963 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample now. The last section is the Output. First part of the output is about the descriptive statistics about the response variable. There are 50 observations in each group. Median sepal lengths are 28 and 30 in I. versicolor and I. verginica groups, respectively. Estimate of θ = θt − θc is 2. The observed values of test statistic and standardized test statistic are 3656.5 and 7.822, respectively for the H0L and 1905 and -4.294, respectively for the H0U . The p-values associated with H0L and H0U are very close to 0. Therefore, we can reject individually both H0L and H0U . Thus, we reject H0 : θ ≤ −5 or θ ≥ 5 with very small p-value 1964 81.3 Test for Equivalence – 81.3.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 81.4 Test for Superiority in Crossover Trial 81.4.1 Example In a 2 × 2 crossover design each subject is randomized to one of two sequence groups. Subjects in the sequence group 1 receive the test drug (T) formulation in a first period, have their outcome variable, X recorded, wait out a washout period to ensure that the drug is cleared from their system, then receive the control drug formulation (C) in period 2 and finally have the measurement on X again. In sequence group 2, the order in which the T and C are assigned is reversed. The table below summarizes this type of trial design. Group 1(TC) 2(CT) Period 1 Test Control Washout — — Period 2 Control Test The resulting data are commonly analyzed using a statistical linear model. The response yijk in period j on subject k in sequence group i, where i = 1, 2, j = 1, 2, and k = 1, . . . , ni is modeled as a linear function of an overall mean response µ, formulation effect τt and τc , period effects π1 and π2 , and sequence effects λ1 and λ2 . The fixed effects model can be displayed as: Group 1(TC) 2(CT) Period 1 µ + τt + π1 + γ1 µ + τc + π1 + γ2 Washout — — Period 2 µ + τc + π2 + λ1 µ + τt + π2 + λ2 For superiority trial, East can test following null hypotheses: Test1: H0 : τt − τc = 0. for treatment effect Test2: H0 : π1 − π2 = 0. for period effect Test1: H0 : λ1 − λ2 = 0. for carryover effect To test the above hypotheses East uses Hodges-Lehmann (HL) implementation of Wilcoxon Mann Whitney test. For example, for test of treatment effect, HL estimate of τt − τc is obtained as 1 · [Median(Y11k1 − Y12k1 , Y22k2 − Y21k2 : k1 = 1, · · · , n1 ; k2 = 1, · · · , n2 )] 2 81.4.1 Example Dataset: CrossOverCaseData.cyd as described in Section 78.3 81.4 Test for Superiority in Crossover Trial – 81.4.1 Example 1965 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample Purpose of the Analysis: The purpose here is to compare the median morning peak expiratory flow rate (PEFR) between placebo and test drug. Let θ be the median difference between Drug and Placebo groups. Analysis Steps: 1. Open the Dataset from the Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Wilcoxon-Mann-Whitney 3. In the ensuing dialog box choose the variables as shown below: 4. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank and enter 0.95 for Confidence Level. 5. Click OK to start analysis. The output will be displayed in the main window. 1966 81.4 Test for Superiority in Crossover Trial – 81.4.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 81.4 Test for Superiority in Crossover Trial – 81.4.1 Example 1967 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample In the Output section, the first part provides descriptive statistics for the two groups. The second table provides the treatment summary. The third table, labeled as Test of Hypothesis, provides results for statistical test of carryover effect. The observed valued of statistic and standardized test statistic are 835 and 1.074, respectively. The p-value for two sided test is 0.283. Therefore, the carryover effect is not significant in this case and we can ignore this carryover effect. Test for Treatment effect Dataset: CrossOverCaseData.cyd as described in Section 78.3 1968 81.4 Test for Superiority in Crossover Trial – 81.4.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Purpose of the Analysis: The purpose here is to compare the median morning peak expiratory flow rate (PEFR) between placebo and test drug. Analysis Steps: 1. Open the Dataset from the Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Wilcoxon-Mann-Whitney 3. In the ensuing dialog box choose the variables as shown below: Click OK to start analysis. Upon completion of analysis, the output will be displayed in the main window. Scroll down to the end of the output. Output for statistical test of treatment effect is displayed in the last two tables. The observed value of statistic and standardized test statistic are 953 and 3.009, 81.4 Test for Superiority in Crossover Trial – 81.4.1 Example 1969 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample respectively. The p-value for two sided test is 0.003. Therefore, the treatment effect is significant in this case. In other words, the test drug significantly increases the median PEFR level over the placebo. 81.5 Test for Noninferiority in Crossover Trial 81.5.1 Example Let θ = τt − τc . In a non-inferiority trial, we test the null hypothesis H0 : θ ≤ δ0 against the alternative hypothesis H1 : θ > δ0 if δ0 < 0 or H0 : θ ≥ δ0 against the alternative hypothesis H1 : θ < δ0 if δ0 > 0. East first subtracts δ0 from all the observations pertaining to Test drug (T). Then the HL estimator is calculated as discussed in Section 81.4. 81.5.1 Example Dataset: pkfood.cyd as described in Section 79.3. Purpose of the Analysis: Here the purpose is to compare the median AUC in regimen A with the regimen B considering the latter as reference and the former as test drug with non-inferiority margin (δ0 ) of -5000 and one-sided type I error of 0.025. We will use θt and θc to denote the median AUC in regimen A and regimen B, respectively. Since δ0 = −5000 < 0, we are testing H0 : θ ≤ δ0 against the alternative hypothesis H1 : θ > δ0 , where θ = θt − θc . Analysis Steps: 1. Open the Dataset from the Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Wilcoxon-Mann-Whitney 3. In the ensuing dialog box choose the variables as shown below: 1970 81.5 Test for Non-inferiority in Crossover Trial – 81.5.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank and enter 0.975 for Confidence Level. 5. Click OK to start analysis. Upon completion of analysis, a new node with label Analysis: Continuous Response: Difference of Means test for Crossover Data1 is added in the Library and the output will be displayed in the main window. 81.5 Test for Non-inferiority in Crossover Trial – 81.5.1 Example 1971 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample In the Output section, the first part provides descriptive statistics for the two groups. The second table provides the treatment summary. The table labeled as Test of Hypothesis provides results for statistical test of treatment effect. The estimated median difference is -1427.25. The observed value of test statistic and standardized test statistic are 146 and 3.099, respectively. The p-value for one-sided test is 0.001. This is the p-value associated with rejecting H0 : θ ≤ −5000 in favor of alternative hypothesis H1 : θ > −5000. The one-sided 97.5% confidence interval is (−2432, ∞). Since the lower limit of the confidence interval is greater than the non-inferiority margin of -5000, we can reject H0 : θ ≤ −5000 at one-sided 2.5% level of significance. 1972 81.5 Test for Non-inferiority in Crossover Trial – 81.5.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 81.6 Test for Equivalence in Crossover Trial Let θ = τt − τc . The null hypothesis H0 : θ ≤ δL or θ ≥ δU is tested against the two-sided alternative hypothesis H1 : δL < θ < θU at level α, using the following two one-sided tests (TOST). Test1: H0L : θ ≤ δL against H1L : θ > δL at level α Test2: H0U : θ ≥ δU against H1U : θ < δU at level α East subtracts θL and θU from all the observations pertaining to Test drug (T) for Test1 and Test2, respectively. Then the HL estimator is calculated as discussed in Section 81.4. To declare equivalence, both H0L and H0U need to be rejected. 81.6.1 Example Dataset: pkfood.cyd as described in Section 79.3. Purpose of the Analysis: Here the purpose is to compare the median AUC in regimen A with the regimen B considering the latter as reference and the former as test drug with bioequivalence limits (δL , δU ) as (-5000, 5000) and type I error rate not exceeding 0.05. Let θt and θc be the median AUC in regimen A and regimen B, respectively, and θ = θt − θc . We want to test the null hypothesis H0 : θ ≤ −5000 or θ ≥ 5000 against the alternative hypothesis H1 : − 5000 < θ < 5000. Analysis Steps: 1. Open the Dataset from the Samples folder. 2. Choose the menu item: Analysis > (Continuous) Two Samples > (Crossover Design) Wilcoxon-Mann-Whitney 81.6 Test for Equivalence in Crossover Trial – 81.6.1 Example 1973 <<< Contents 81 * Index >>> Analysis-Nonparametric Two-Sample 3. In the ensuing dialog box choose the variables as shown below: 4. Click OK to start analysis. Upon completion of analysis, a new node with label Analysis: Continuous Response: Difference of Means test for Crossover Data1 is added in the Library and the output will be displayed in the main window. 1974 81.6 Test for Equivalence in Crossover Trial – 81.6.1 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the Output section, the first part provides descriptive statistics for the two groups. The second table provides the treatment summary. The third table labeled as Test of Hypothesis provides results for statistical test of treatment effect. The estimated median difference is -1427.25. The observed values of test statistic and standardized test statistic are 146 and 3.099 respectively for the H01 and 55 and -3.78, respectively for the H02 . 81.6 Test for Equivalence in Crossover Trial 1975 <<< Contents * Index >>> 82 Analysis-ANOVA Sometimes the goal of a clinical trial is to compare more than two treatment arms. For example, in a phase II dose-finding study multiple doses of an experimental drug may be compared with placebo or some other control. The most popular method applied to this kind of data is Analysis of Variance (ANOVA). Designing of such studies with continuous endpoint is discussed in chapter 14. In this section, we focus on how to analyze data collected from such studies using ANOVA in East. As an alternative to ANOVA, you can analyze these kind of data using multiple comparison procedures as well and this is discussed in chapter 84. 82.1 Example: One Way ANOVA In a one-way Analysis of Variance (ANOVA) test, we wish to test the equality of means across R independent groups. Let Xij indicate the response from j th unit of ith group; i = 1, · · · , R, j = 1, · · · , ni . Further assume, Xij ∼ N (µi , σ 2 ); i = 1, · · · , R. In one-way ANOVA, the goal is to compare the null hypothesis H0 : µ1 = µ2 = · · · = µR against the alternative hypothesis H1 : for at least one pair (i, i0 ), µi 6= µi0 , where i, i0 = 1, 2, · · · R. Dataset: leucolyte.cyd. Data Description Kontula K et al (1980, 1982) conducted a study to compare the number of glucocorticoid receptor (GR) sites per leukocyte cell in 5 groups of patients: 1. 2. 3. 4. 5. 1976 Group 1: Group 2: Group 3: Group 4: Group 5: normal subjects patients with hairy-cell leukemia patients with chronic lymphatic leukemia patients with chronic myelocytic leukemia patients with acute leukemia 82.1 Example: One Way ANOVA <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Purpose of the Analysis: The goal is to compare the mean GR sites per leukocyte cell among the 5 groups of patients. Let µi denote the mean number of GR sites per leukocyte cell in ith group of subjects/patients; i = 1, · · · , R. To test the null hypothesis H0 : µ1 = µ2 = µ3 = µ4 = µ5 with 5% level of significance. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous): Many Samples > (Factorial Design) One-Way ANOVA 3. In the Main tab, select Group as Factor and GR as Response. Leave the check box for Contrast unchecked. 4. In the Advanced tab, you can select up to 2 grouping variables. If only one grouping variable is selected, then a different analysis will be displayed for each level of the selected grouping variable. If two grouping variables are selected then East will display different analysis for each combination of levels of two grouping variables. In this analysis, leave the fields By Variable 1 and By Variable 2 blank. 82.1 Example: One Way ANOVA 1977 <<< Contents 82 * Index >>> Analysis-ANOVA 5. Click OK to start the analysis. After completion of the analysis, the output is displayed in the main window. The last section is the Output. From the ANOVA Table, the significance level for Group effect is 0.007. Therefore, the conclusion is to reject H0 : µ1 = µ2 = µ3 = µ4 = µ5 at 5% level of significance. 82.2 1978 Example: One Way Contrast Often one may be interested in testing significance of linear combination of group means instead of just finding the difference in group means. This can be done through the use of contrast. A contrast of the population means is a linear combination of the 82.2 Example: One Way Contrast <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 µi ’s. P For the given scalars, ci µi denotes a linear contrast of P{ci : i = 1, · · · , R}, C = population mean if ci = 0. For a single contrast test of many means in a one-way ANOVA, the null hypothesis that we wish to test is: H0 : P ci µi = 0 versus a 2-sided alternative H1 : P ci µi 6= 0 Or a 1-sided alternative H1 : P ci µi < 0 or H1 : P ci µi > 0. Dataset: leucolyte.cyd as described in section 82.1 Purpose of the Analysis: Let µi denote the mean number of GR sites per leukocyte cell in ith group of subjects/patients; i = 1, · · · , R. We are interested in comparing the mean number of GR sites in normal subjects (Group 1) with the average of mean number of GR sites in all the remaining groups. That is, we are interested in comparing: µ1 with µ2 + µ3 + µ4 + µ5 . 4 To do this comparison, test the following null hypothesis: H0 : 14 µ2 + 14 µ3 + 14 µ4 + 14 µ5 − µ1 = 0 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous): Many Samples > (Factorial Design) One-Way ANOVA 82.2 Example: One Way Contrast 1979 <<< Contents 82 * Index >>> Analysis-ANOVA 3. In the Main tab, select Group as Factor and GR as Response. Select the check box for Contrast. A table is displayed below it. Enter −1, 0.25, 0.25, 0.25 and 0.25 in Coefficient column for the 5 categories. 4. Click OK to start the analysis. After completion of the analysis, the output is 1980 82.2 Example: One Way Contrast <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 displayed in the main window. The result of the analysis is divided into three sections. Test for contrast is displayed in the third section labeled as Output. The 2-sided p-value for testing the H0 : 14 µ2 + 14 µ3 + 14 µ4 + 14 µ5 − µ1 = 0 is 0.029. Therefore, we can conclude that mean number of average GR sites in normal subjects (Group 1) is significantly different than the average of mean number of average GR sites in all the remaining groups with observed significance level of 0.029. 82.2 Example: One Way Contrast 1981 <<< Contents 82 82.3 * Index >>> Analysis-ANOVA Example: One Way Repeated Measures (Constant Correlation) ANOVA As with the one-way ANOVA discussed in section 82.1, the repeated measures ANOVA also tests for equality of population means. However, in a repeated measures setting, the subjects are measured repetitively over time. Therefore, the measurements observed within a same subject are correlated. This correlation between observations from the same subject needs to be accounted for in ANOVA. The constant correlation assumption refers to the equal correlation between any pair of observations from a subject. Denote this constant correlation by ρ. The Repeated ANOVA module in East allows to test the effects of subject and time as well as test for contrast in subject means. Dataset: Body wight.cyd Data Description Here consider the body weight data of guinea pigs given by Crowder and Hand (1989, p. 27). The data was obtained to investigate the effect of vitamin E diet supplement on the growth of guinea pigs. For each animal the body weight (in gram) were recorded at the end of 1, 3, 4, 5, 6, and 7 weeks. All animals were given a growth-inhibiting substance during week 1 and the vitamin E therapy was started at the beginning of week 5. Three groups of animals, numbering five in each, received respectively zero, low and high doses of vitamin E. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous): Many Samples > (Factorial Design) One-Way Repeated Measures 3. In the Main tab, select Animal as Subject(Factor), Week as Time(Repeated) 1982 82.3 One Way Repeated Measures ANOVA <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and Weight as Response. Leave the check box for Contrast unchecked. 4. In the Advanced tab, you can select up to 2 grouping variables. If only one grouping variable is selected, then a different analysis will be displayed for each level of the selected grouping variable. If two grouping variables are selected then East will display different analysis for each combination of levels of two grouping variables. Select Dose as By Variable 1. 5. The third tab is SAS Command where you can put SAS code for more sophisticated analysis. For this example, do not make any changes in this tab. 6. Click OK to start the analysis. The output is displayed in the main window. 82.3 One Way Repeated Measures ANOVA 1983 <<< Contents 82 * Index >>> Analysis-ANOVA ANOVA for all the three dose groups is displayed in the Output section. 1984 82.3 One Way Repeated Measures ANOVA <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The output suggests that the effect of animal and week is highly significant in all the three dose groups. 82.4 Example: Two Way ANOVA In a two-way ANOVA, there are two factors to consider, say A and B. Let Xijk indicate the response from k th replication of ith level of A and j th level of B; i = 1, · · · , a, j = 1, · · · , b, k = 1, · · · , n. Further we assume, Xijk ∼ N (µij , σ 2 ); i = 1, · · · , R. In two-way ANOVA, the goal is to test the following null hypotheses Test for main effect of factor A. H0 : The group means for all the levels of factor A is same. Test for main effect of factor B. H0 : The group means for all the levels of factor B is same. Test for interaction effect of A and B. H0 : The effect of A remains same for all levels of B or the effect of B remains same for all levels of A. Dataset: Body wight.cyd as described in Section 82.3 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous): Many Samples > (Factorial Design) Two-Way ANOVA 3. In the Main tab, select Dose as Factor1, Week as Factor2 and Weight as Response. Leave the Interaction Effect check box checked for test of interaction effect between the two factors. 82.4 Example: Two Way ANOVA 1985 <<< Contents 82 * Index >>> Analysis-ANOVA 4. Click OK to start the analysis. The output is displayed in the main window now. The p-values associated with main effect of Dose and Week are 0.011 and 5.19×10−10 . These p-values suggest significant main effect for Dose and Week. The interaction between Dose and Week is not significant (p-value = 0.878). 1986 82.4 Example: Two Way ANOVA <<< Contents * Index >>> 83 Analysis-Regression Procedures This chapter demonstrates how to run regression analysis in East. East can perform multiple linear regression, repeated measure regression and fit linear mixed effect (LME) model on data obtained from 2 × 2 crossover design. The LME model on 2 × 2 crossover data can be fit either to test for difference of means or ratio of means. These are discussed in sections 83.1, 83.2, 83.2, 83.3 and 83.4. In addition to fitting the regression coefficients, East can also be used to: perform significance testing of regression coefficients using Wald test perform 1st order autocorrelation in residuals using Durbin-Watson test compute collinearity diagnostics compute different types of residuals compute influential statistics compute predicted values perform variable selection 83.1 Example: Multiple Linear Regression Dataset: Werner.cydx as described in Section 73.4.2. Purpose of the Analysis: In this example, the multiple regression technique is used to find relationship of the variable Cholesterol with the other variables. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Regression > (Parallel Design) Multiple Linear Regression This will display several input fields associated with regression analysis in the 83.1 Example: Multiple Linear Regression 1987 <<< Contents 83 * Index >>> Analysis-Regression Procedures main window. In the Main tab, there are two boxes – Variables and Model. In the Variable box, all the numeric variables in the dataset are displayed. The Toggle Factor On/Off button can change the status of a variable between numeric and factor variable. 3. For example, select the BIRTHPILL variable and click Toggle Factor On/Off button. This will declare the BIRTHPILL variable as factor variable. The suffix is added to BIRTHPILL in the list of variables. The suffix indicates that the BIRTHPILL will be treated as factor variable in the multiple linear regression, if included as predictor. We can declare any variable in the Variables box as factor variable. For this example, only consider 1988 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 BIRTHPILL as factor variable. 4. In the box Model, choose CHOLESTEROL as Response variable. Below this, there is a box with only entry %const. This is where all the predictors in the model has to be included. The term %const refers to the intercept (β0 ). To remove this term clear the checkbox Include Intercept Term. In the absence of this term, East will perform multiple regression analysis without any intercept. For this example, keep this term. Include all the variables except ID in this box. To include a variable in this box, select the variable from the list of variables in the Variable box then click button. To de-select a selected term click button. 5. Now, we might believe that the effect of birth pill use on cholesterol level varies with age. In other words, there might be interaction between age and birth pill use. To include the interaction effect, select Age and BIRTHPILL< f a > in button. This adds the term the Variable box using Ctrl key, and click 83.1 Example: Multiple Linear Regression 1989 <<< Contents 83 * Index >>> Analysis-Regression Procedures AGE*BIRTHPILL in the predictor variable list. The interaction effect AGE*BIRTHPILL is an example of first order interaction. In East, you can also include interaction effect of higher orders. To include interaction effect, select all the variables that are interacting and click button. 6. Click the Options tab. There are two sub-tabs within this tab – General and MLR Setting. 7. In the General sub-tab, leave the default choice of Beta for Output Parameter and Two-Sided for Output p - value. 1990 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the MLR Setting sub-tab, there is a list of checkboxes in two columns. The purpose of the checkboxes is given below: Fitted Values: Calculates the fitted values. ANOVA: Includes ANOVA table in the regression output. Variance Covariance Matrix: Includes variance covariance matrix for estimated regression coefficients in the regression output. Estimated MSE of Prediction (GMSEP): Includes mean squared error (MSE) (or variance of residuals) and mean squared error of prediction (MSEP) in the regression output. Durbin Watson Test: Performs the test for first order autocorrelation among residuals and the results are displayed in the regression output. Wald Test: Performs the Wald test for significance of regression coefficients and the results are displayed in the regression output. Use Best Subset: Performs the subset selection using backward elimination, forward selection, sequential replacement, stepwise selection or exhaustive search technique. Collinearity diagnostics: Provides collinearity diagnostics such as Eigenvalues of (X T X)−1 and condition numbers. Before calculation of Eigenvalues, X T X is scaled to have 1’s on the diagonal. The condition numbers are the square roots of the ratio of the largest Eigenvalue to each individual Eigenvalue. The largest condition number is the condition number of the scaled X matrix. Unstandardized Residuals: Calculates the residuals. 83.1 Example: Multiple Linear Regression 1991 <<< Contents 83 * Index >>> Analysis-Regression Procedures Standardized Residuals: Calculates the standardized residuals. Studentized Residuals: Calculates the studentized residuals. Deleted Residuals: Calculates the deleted residuals deleting the corresponding observation. 8. Select the first 4 checkboxes – Fitted Values, ANOVA Table, Variance Covariance Matrix and Estimated MSE of Prediction (MSEP). Then select the checkbox for Durbin Watson Test. A third sub-tab Durbin-Watson Test is added. 1992 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 9. Click the Durbin-Watson Test tab. 10. In this tab select or de-select the terms for the Durbin-Watson test using and buttons. Select all the variables from Model Terms to Terms to test. 11. Come back to MLR Setting sub-tab. Now check the box for Wald Test. This 83.1 Example: Multiple Linear Regression 1993 <<< Contents 83 * Index >>> Analysis-Regression Procedures will add a new sub-tab labeled as Wald Test. Click on this sub-tab and select all the variables from Model Terms to Terms to test. 12. Come back to MLR Setting sub-tab and select the Use Best Subset check box. This will add a new sub-tab labeled as Best Subset Selection. Click on this sub-tab. 13. The first column is a box with label Force Inclusion of Model Terms and it includes all the model terms. Here select the variables that needs to be retained forcefully in the model and selection method will be applied on the remaining terms. In this example, use of Birthpill is an important factor that influences cholesterol level. Therefore, select BIRTHPILL< f a >. BIRTHPILL< f a > 1994 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 will skip the variable selection procedure and it will always be part of the best subset of variables. 14. In the second column choose the method of subset selection. The choices are Backward Elimination, Forward Selection, Sequential Replacement, Stepwise Selection and Exhaustive Search. In Forward Selection procedure, the model starts with the constant term (or with the forced terms) and it keeps adding new terms in each step that gives largest reduction in sum of squares of the residuals (SSE). The method stops when inclusion of none of the additional terms results in sufficient amount of reduction in SSE. In Backward Elimination procedure, the model starts with all the available terms and then eliminates a variable in each step that provides minimum reduction in SSE. The method stops when the reduction in SSR due to dropping of any variable exceeds some threshold amount. The Stepwise Selection procedure is like Forward Selection except that at each step dropping of variables is also considered as in Backward Elimination procedure. At each step, the F value is calculated for each variable. If S indicates the set of all the variables in the subset in the current step, then for ith variable, F value, Fi is calculated as follows: Fi = SSR(S ∪ {i}) − SSR(S) M SE(S ∪ {i}) i∈ /S Fi = SSR(S) − SSR(S − {i}) M SE(S) i∈S For i ∈ / S, the ith variable is entered in subset if Fi > Fin . For i ∈ S, the ith variable will be dropped from the subset if Fi < Fout . In the sequential replacement procedure, for a given number of variables, variables are sequentially replaced and replacements that improve performance are retained. This approach checks whether any of the variables selected in the current model can be replaced with another variable to give a smaller residual sum of squares. In exhaustive search procedure, all possible subset are evaluated and the subset with largest adjusted R2 is chosen. 15. Select Stepwise Selection. A box labeled as Stepwise Selection Criteria appears. Keep the default values 3.84 and 2.71 for F to enter and F to omit. These two values corresponds to Fin and Fout , as explained above. There are two fields – Size of Subset and No. of Best Subset. In Size of Subset, enter the maximum allowed size of the subset. For example, the data contains 20 83.1 Example: Multiple Linear Regression 1995 <<< Contents 83 * Index >>> Analysis-Regression Procedures independent variables, but you want to restrict the search to subsets which have a maximum of 7 variables. In this case, specify the size of subset as 7. In the field No. of Best Subset, specify the number of top models of each subset size which will be included in the output. In this example, enter 3 and 1 for these two fields. With this specification, we are looking for subset of variables of size 2 and 3 of which one of the term must be BIRTHPILL . The subset of size 1 will not be displayed as we have already specified one variable to enter forcefully in all the subset and thus the subset of 1 variable does not require any variable to come from subset selection procedure. 16. Click the MLR Setting sub-tab and select the check box Collinearity 1996 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 diagnostics. A new box labeled as Parameters for Collinearity appears. 17. In the Parameters for Collinearity box, specify two parameters – Multi Collinearity Criterion and No. of Collinearity Component. The Multi Collinearity Criterion refers to the value that controls how small the determinant of the matrix (that is inverted to compute the coefficient estimates) is allowed to be. This value must be less than 1 and greater than 0. The latter refers to number of collinearity components we want East to display. This number can be between 2 and the number of terms in the model including intercept, if any. In this case choose a number of collinearity components between 2 to 10. East specifies default values of 0.05 for Multi Collinearity Criterion and 2 for No. of Collinearity Component. For this example, keep these two values unchanged. 18. In the Residual box, check all the 4 types of residuals - Unstandardized, Standardized, Studentized and Deleted. Upon checking any of these residuals, 83.1 Example: Multiple Linear Regression 1997 <<< Contents 83 * Index >>> Analysis-Regression Procedures a new box appears labeled as Influential statistics. Unstandardized residuals (ri ) are obtained simply by subtracting predicted value of response variable (Ŷi ) from the observed value (Yi ) for each observations. That is, ri = Yi − Ŷi Standardized residuals are the Unstandardized residuals divided by the root mean square error (RMSE). Even though this is called as standardized residuals, this is not standardized in true sense, because the residuals does not have equal variance (even with constant variance assumption). The variance for ith residual is estimated as σ 2 (1 − hii ), where, hii is the ith diagonal element of hat matrix, H. It is more appropriate to standardize the residuals as follows: ri σ̂ p (1 − hii ) This is known as studentized residuals. Cook and Weisberg refer to this as external studentization. These residuals have t-distributions with N − K degrees of freedom, so any residual with absolute value exceeding 3 usually requires attention. The deleted residuals are obtained as Yi − Ŷi−i , where Ŷi−i indicates the predicted value of Yi where prediction is done excluding ith observation. 19. Check all the influential statistics in the Influence statistics box next to the 1998 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Residuals. Cook’s distance is an overall measure of the impact of ith datapoint on the estimated regression coefficients. This is defined as: PN (Ŷi − Ŷi−i )2 Di = i=1 K σ̂ Di ’s are distributed as F (K, N − K). If Di < F (0.2, K, N − K) then the ith case has only little apparent influence on the fitted values. On the other hand, if Di > F (0.5, K, N − K), the ith observation should be considered influential. DFFITs are calculated for each observation. For ith observation this is defined as: (DF F IT )i = Ŷi − Ŷi−i √ σ̂ −i hii where σ̂ −i is the RMSE or estimate of σ obtained excluding ith observation. Kutner et al. (2004) suggested to consider a case as influential if the absolute p value exceeds 1 for small to medium size data and 2 K/N for large datasets. The measure Covariance Ratios reflects the change in the variance-covariance matrix of the estimated coefficients when the ith observation is omitted. For ith observation, it is obtained as ratio of determinant of covariance matrix of estimate of β excluding ith observation to the determinant of covariance matrix of estimate of β including all the observations. It is suggested that |CRi − 1| ≥ 3K/N warrants further investigation. 83.1 Example: Multiple Linear Regression 1999 <<< Contents 83 * Index >>> Analysis-Regression Procedures Hat matrix diagonals simply refers to ith diagonal element, hii , of hat matrix, H, for ith observation. This measure is also known as the leverage of the ith observation. The diagonal elements sum to the number of parameters being fitted. Any value greater than 2K/N suggests further investigation. 20. Click OK to start the analysis. After completion of the analysis a new node with title Analysis: Multiple Linear Regression1 is added to the Library. It has two sub-nodes – MLR-Residuals1 and MLR-Best Subset Selection1. 21. The output is displayed in the main window. The first part of the output is as shown below: The dataset contains a total of 188 records and out of this 7 are rejected due to missing 2000 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 observations. The table titled “Terms dropped due to” refers to some essential pre-processing of the data. If a particular independent variable assumes the same value throughout the data set, it is not really a ‘variable’ and has to be dropped. Its presence creates ‘singularity’ in the design matrix X. In the present data set there is no such problem and hence the entry is ‘None’. Multicollinearity is another possible characteristic of the data, which could make the problem unstable. In the present data set, no such difficulty is encountered and hence the entry ‘None’ appears. The table “Summary Statistics” displays some relevant summary statistics on residuals. In this example, N = 181 and K = 9. Thus the residual degrees of freedom is 181 − 9 = 172. The multiple R2 value is 0.256. This is obtained as: R2 = SSR SST The estimate of σ or error variance is 39.49. The residual sum of squares or SSE is 268221.407. The table with title Parameter Estimates provides the estimate of regression coefficients with its standard error. It also provides 95% confidence interval of these estimates, the observed value of t-statistic, the p-value for testing H0 : βk = 0 and sum of squares. It appears that the terms age, calcium and uric acid is significant at 5% level of significance and the term height is significant at 10% level of significance. Notice that the variable BIRTHPILL considered as factor variable now has a suffix “ 0”. East creates a dummy variable for the level 0 of factor BIRTHPILL. This dummy variable takes 1 for the observations with BIRTHPILL=0; otherwise it takes value 0. Here, the 83.1 Example: Multiple Linear Regression 2001 <<< Contents 83 * Index >>> Analysis-Regression Procedures level 1 for the factor BIRTHPILL is considered as the reference level. The MSE (σ̂ 2 ) and RMSE (σ̂) are 1559.427 and 39.49. The MSE of prediction is 2002 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1641.401. The following table displays the estimated covariance matrix of β̂: The table below displays the collinearity diagnostics: When there is no collinearity at all, the Eigenvalues and condition number will all equal 1. As collinearity increases, Eigenvalues will be both greater and smaller than 1 (Eigenvalues close to zero indicates a multicollinearity problem), and the condition number will increase. Belsey, Kuh, and Welsch (1980) suggest that, when this number is around 10, weak dependencies might be starting to affect the regression estimates. 83.1 Example: Multiple Linear Regression 2003 <<< Contents 83 * Index >>> Analysis-Regression Procedures Montgomery et al recommend use of 100 as indicative of moderate concern while a value of 1000 is an alarm trigger (Montgomery, Peck, and Vining, 2003, page 339). For this model, the condition number of scaled X matrix is 119.39. Thus, it may be pertinent to take corrective step such as centering the data. The ANOVA table shows that the total degrees of freedom are 180, 8 independent variables give rise to 8 d.f. for regression and the remaining 172 degrees of freedom are assigned to error. The very low p-value shows that the model fitted is ‘significant’. Therefore, we have to reject the null hypothesis that all regression coefficients are zero. 1. Click MLR-Residuals1 in the Library. It displays the predicted values, 2004 83.1 Example: Multiple Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 residuals and influential observations. 2. Click MLR-Best Subset Selection1 in the Library. This displays the output for best subset selection. For this example, the best subset of model with two terms includes BIRTHPILL and AGE as predictor. The best subset model of size 3 includes the predictors BIRTHPILL, AGE and CALCIUM. 83.2 Repeated Regression Example: Repeated Regression In a repeated measures setting, the subjects are measured repetitively over time. Therefore, the measurements observed within a same subject are correlated. In repeated regression analysis we take account of this correlation. East performs repeated regression analysis using the MIXED procedure of SAS. East 83.2 Repeated Regression 2005 <<< Contents 83 * Index >>> Analysis-Regression Procedures first generates equivalent SAS code and then displays the one obtained from the MIXED procedure in SAS. Example: Repeated Regression Dataset: Body Weight.cyd as described in Section 82.3. Purpose of the Analysis: The data was obtained to investigate the effect of vitamin E diet supplement on the growth of guinea pigs. For each animal the body weight (in gram) was recorded at the end of weeks 1, 3, 4, 5, 6 and 7. All animals were given a growth-inhibiting substance during week 1 and the vitamin E therapy was started at the beginning of week 5. Three groups of animals, numbering five in each, received respectively zero, low and high doses of vitamin E. For this example, we will consider only observation from zero and high dose-groups. Here we want to fit the following model: W eightij = β0 + β1 I(Dosei = High) + β2j W eekij + ij 2006 83.2 Repeated Regression – Example: Repeated Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis Steps: 1. Open the dataset from Samples folder. 2. Delete the observations pertaining to “Low” dose (row 31 to 60). To delete the observations, select these observations, click menu and click Delete Case. under the Data Editor Now there are 60 observations left from “No” and “High” dose groups. 3. Choose the menu item: Analysis > (Continuous) Regression > (Parallel Design) Repeated Measures Regression This will display several input fields associated with repeated regression analysis 83.2 Repeated Regression – Example: Repeated Regression 2007 <<< Contents 83 * Index >>> Analysis-Regression Procedures in the main window. 4. There are three tabs in this window – Main, Advanced and SAS Command. In the Main tab, select Weight as Response, Dose as Treatment, Animal as Subject and Week as Time(Repeated). All the remaining variables are displayed in Covariates field and you can select all or some of them as covariates. In the last row, there are 3 fields. First one is the Method of Estimation with choices of restricted maximum likelihood estimation (REML) and maximum likelihood estimation (MLE). Second one is the field Covariance Structure with choices of first order auto-regressive correlation (AR(1)), compound symmetry (CS), unstructured (UN), unstructured using correlations (UNR) and variance components (VC). Since all the animals were measured at fixed and equal times points, we can choose any reasonable covariance structure from AR(1), CS, UN and UNR. The last one is the DF where you have to specify the method for computing the denominator degrees of freedom for the tests of significance of coefficients. Keep the default selections of REML, UN and Contain for Method of Estimation, Covariance Structure and DF. 2008 83.2 Repeated Regression – Example: Repeated Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Enter 0.95 for Confidence Level. 6. The third tab is SAS Command where you can put SAS code for more sophisticated analysis. For this example, do not make any changes in this tab. 7. Click OK to start analysis. The output will be displayed in the main window now. ANOVA for all the three dose groups is displayed in the Output section. 8. The output for estimated covariance structure is displayed in following 83.2 Repeated Regression – Example: Repeated Regression 2009 <<< Contents 83 * Index >>> Analysis-Regression Procedures screenshot. 2010 83.2 Repeated Regression – Example: Repeated Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 9. The estimated coefficients are given in the following screenshot: Therefore, the fitted model in this case is: W eightij = 572.94 + 49.32I(Dosei = Height) − 115.5I(W eek = 1) −70.6I(W eek = 3)−23.3I(W eek = 4)−30.9I(W eek = 5)−30.2I(W eek = 6) for i = 1, · · · , 60 with covariance structure as 588.80 563.65 407.72 505.86 83.3 Linear Mixed Effects Model: Difference of Means (Crossover Data) 563.65 1461.05 1406.26 1574.66 407.72 1406.26 1513.03 1588.52 505.86 1574.66 1588.52 1934.95 In linear mixed effects model a linear model is fitted to explain variability in the response variable with the help of factors with levels, which have fixed effects and more than one random effect. Mixed Effects model is used for hierarchical or dependent data. You will need to specify Response variable for which you want to fit the model. In this particular design, Response variable is often difference of means in test and control group. You will need to specify factors (variables) with fixed effects namely Period 83.3 Linear Mixed Effects Model 2011 <<< Contents 83 * Index >>> Analysis-Regression Procedures ID, Group ID and Treatment ID. You will also need to specify Subject ID which identifies the source of the response variable. You have an option of checking the box Run using SAS on Advanced tab. By doing this East will invoke Mixed procedure of SAS. You can also choose not to use SAS. If you use SAS, you will have the option of including covariates in our model. Without SAS, your model will not include covariates. You can also invoke SAS command option from the dialogue box for this test. East will display among other things estimates, t-statistics and ANOVA table for fixed effects. In this section, we will illustrate repeated regression analysis of 2x2 crossover data using all the three options. Dataset: CrossoverCaseData.cyd Analysis Using East: Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Regression > (Crossover Design) Linear Mixed Effects Model: Difference of Means This will display several input fields associated with linear mixed effects model analysis in the main window. 3. There are three tabs in this window – Main, Advanced and SAS Command. In the Main tab, select Response as Response, PeriodID as Period ID, GroupID as Group ID and SubjectID as Subject ID (Random Effect). Once you select a random effect, options of Method of Estimation with choices of restricted maximum likelihood estimation (REML) and maximum likelihood estimation (MLE) will become available. Keep the default selection of REML. 2012 83.3 Linear Mixed Effects Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Keep the default value 0.95 for Confidence Levelas well as all the statistics to be computed. Don’t check the Run Using SAS checkbox. 5. The third tab is SAS Command where you can put SAS code for more sophisticated analysis. For this example, do not make any changes in this tab. 83.3 Linear Mixed Effects Model 2013 <<< Contents 83 * Index >>> Analysis-Regression Procedures 6. Click OK to start analysis. The output will be displayed in the main window. Analysis Using SAS: Analysis Steps: 1. Choose the menu item: 2014 83.3 Linear Mixed Effects Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis > (Continuous) Regression > (Crossover Design) Linear Mixed Effects Model: Difference of Means As explained earlier, in the Main tab, select Response as Response, PeriodID as Period ID, GroupID as Group ID and SubjectID as Subject ID (Random Effect). Once you select Random Effect, options of Method of Estimation with choices of restricted maximum likelihood estimation (REML) and maximum likelihood estimation (MLE) will become available. Keep the default selection of REML. 2. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Keep the default value 0.95 for Confidence Level as well as all the statistics to be computed. Now check the Run Using SAS checkbox. Note that you can use covariates while running SAS. Don’t check any of the covariates for this example. 3. Click OK to start analysis. East will invoke SAS and the SAS output will be 83.3 Linear Mixed Effects Model 2015 <<< Contents 83 * Index >>> Analysis-Regression Procedures displayed in the main window. 2016 83.3 Linear Mixed Effects Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 83.3 Linear Mixed Effects Model 2017 <<< Contents 83 * Index >>> Analysis-Regression Procedures Analysis Using SAS Command: Analysis Steps: 1. Choose the menu item: Analysis > (Continuous) Regression > (Crossover Design) Linear Mixed Effects Model: Difference of Means As explained earlier, in the Main tab, select Response as Response, PeriodID as Period ID, GroupID as Group ID and SubjectID as Subject ID (Random Effect). Once you select a random effect, options of Method of Estimation with choices of restricted maximum likelihood estimation (REML) and maximum likelihood estimation (MLE) will become available. Keep the default selection of REML. 2018 83.3 Linear Mixed Effects Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Keep the default value 0.95 for Confidence Levelas well as all the statistics to be computed. Don’t check the Run Using SAS checkbox. 3. Go to the SAS Command tab. You will see a SAS code already written in the main window. A partial view of the same is shown below: 83.3 Linear Mixed Effects Model 2019 <<< Contents 83 * Index >>> Analysis-Regression Procedures The first few commands are meant for reading the data in SAS. You will also see the statement /* Write your code here */ We will replace this part by our code. DATA CrossoverCaseData ; set CrossoverCaseData ; proc mixed method = REML; class GroupID PeriodID ; model Response = GroupID PeriodID ; repeated PeriodID ; random subjectID; run; Please remove the following lines from the existing code. = log( ); run; proc sort data = CrossoverCaseData out = SASSortMixed; by ... ; run; This is required as we don’t want to log transform the Response and also don’t want to sort the data on any by variable. 2020 83.3 Linear Mixed Effects Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK. East will invoke SAS and the SAS output will be displayed in the main window. 83.3 Linear Mixed Effects Model 2021 <<< Contents 83 2022 * Index >>> Analysis-Regression Procedures 83.3 Linear Mixed Effects Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 83.4 Linear Mixed Effects Model: Ratio of Means (Crossover Data) This test is very similar to the test described in previous subsection except here the Response variable is often the ratio of means of treatment and control group. The previous test is applied to logarithm of the response variable. Both the options of SAS link (Run using SAS) and SAS commands are available for this test. 83.4 Linear Mixed Effects Model: Ratio of Means (Crossover Data) 2023 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Type I error rate is inflated when one considers the inferences together as a family. Failure to compensate for multiplicities can have adverse consequences. For example, a drug could be approved when actually it is not better than Placebo. Multiple comparison (MC) procedures provide a guard against inflation of type I error due to multiple testing. Probability of making at least one type I error is known as family wise error rate (FWER). East supports several parametric and p-value based MC procedures. We have seen how to simulate data under different MC procedures with specified group means and variance in chapter 15. In this chapter we explain how to analyze data with different MC procedures available in East. For MC procedures in East, we can either provide the dataset containing the observations under each arm or the raw p-values to obtain the adjusted p-values. 84.1 Available Procedures The probability of making at least one type I error is known as family wise error rate (FWER). All the MC procedures available in East strongly control FWER. Strong control of FWER refers to preserving the probability of incorrectly claiming at least one null hypothesis. To contrast strong control with weak control of FWER, the latter controls the FWER under the assumption that all hypotheses are true. East supports following MC procedures based on continuous endpoint. Category Parametric P-value Based Procedure Dunnett’s Single Step Dunnett’s Step Down Dunnett’s Step Up Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback Dose-Finding Hypertension Trial 2024 84.1 Available Procedures Reference Dunnett CW (1955) Dunnett CW and Tamhane AC (1991) Dunnett CW and Tamhane AC (1992) Bonferroni CE (1935, 1936) Sidak Z (1967) Benjamini Y and Hochberg Y ( 1997) Holm S (1979) Hochberg Y (1988) Hommel G (1988) Westfall PH, Krishen A (2001) Wiens B, Dmitrienko A (2005) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Throughout this chapter we consider the data from a dose-finding hypertension trial (Dmitreinko and Offen, 2005) to illustrate different MC procedures. The trial was conducted to compare four doses of a new antihypertensive drug to a Placebo. The primary outcome is reduction in diastolic blood pressure. Doses with significant mean reduction in mean diastolic blood pressure will be declared efficacious. The data from this trial are available in East through the dataset Hypertension-trial.cyd. Let µ0 , µ1 , µ2 , µ3 and µ4 indicate the group means in Placebo, Dose1, Dose2, Dose3 and Dose4 treatment groups. We are interested in testing following right tailed tests: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0 i = 1, 2, 3, 4 and the global null hypothesis H0 : µ0 = µ1 = µ2 = µ3 = µ4 We want to control the FWER at 5% level of significance. 84.2 Example: Dunnett’s single step Dataset: Hypertension-trial.cyd Data Description: The trial was conducted to compare four doses of a new antihypertensive drug to a Placebo. The primary outcome is reduction in diastolic blood pressure. Doses with significant mean reduction in mean diastolic blood pressure will be declared efficacious. The dataset has 130 observations and 2 columns. The first column Dose contains the information on the dose level. There are 5 dose levels including Placebo. In this column, P represents Placebo where as “D1” through “D4” represent 4 dose levels of the drug. The second column, Response, contains the reduction in diastolic blood pressure (expressed in mmHg). Each line in the data set represents a subject in the study. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 84.2 Example: Dunnett’s single step 2025 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Now click on Advanced tab. Leave the fields By Variable 1 and By Variable 2 blank. On the left, enter 0.95 for Confidence Level and select Right-Tail for Rejection Region. 5. Click OK to start the analysis. Once the analysis is over, the output will be 2026 84.2 Example: Dunnett’s single step <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 displayed in the main window now. The last section is the Output. For each treatment group (referred as Arm), sample size, mean and standard error of difference of mean are given in a table. The Ctrl arm in this table indicates “Placebo”. Mean responses for Placebo, Dose1, Dose2, Dose3 and Dose4 are 0.704, 0.008, 5.254, 5.629 and 7.331 mmHg, respectively. The table in the Output section also includes the observed value of test statistic and p-values for comparison with control group along with 95% one-sided confidence interval for the difference with Placebo. There are two types of p-values in this table. The Naive p-values are referred to raw or un-adjusted p-values. The p-values in the Adjusted column are obtained after multiplicity adjustment according to Dunnett’s single step procedure so that FWER is maintained at 5% level of significance. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.896, 0.033, 0.023 and 0.001, respectively. Therefore, after multiplicity adjustment according to Dunnett’s single step procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. 84.2 Example: Dunnett’s single step 2027 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data Under this table, adjusted global p-value is given which is 0.001 in this case. This is the p-value to reject the following global null hypothesis: H0 : µ0 = µ1 = µ2 = µ3 = µ4 One can verify that global p-value is the minimum of all the 4 adjusted p-values given in the table above. 84.3 Example: Dunnett’s step-down and step-up procedures Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Now click on Advanced tab. Leave the fields By Variable 1 and By Variable 2 blank. On the left, enter 0.95 for Confidence Level and select Right-Tail for Rejection Region. 2028 84.3 Dunnett’s procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. Click OK to analyze the data. The output will be displayed in the main window once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.638, 0.018, 0.018 and 0.001, respectively. Therefore, after multiplicity adjustment according to Dunnett’s step-down procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. We can perform Dunnett’s step-up test by selecting Dunnett’s step-up from the drop-down menu in Select MCP in the Main tab of input window. However, Dunnett’s step-up test cannot be performed with Hypertension-trial.cyd dataset as the number of observations for all the 5 treatment are not equal. In other words, the treatment groups are not balanced in this data. Number of observations in Placebo, Dose1, Dose2, Dose3 and Dose4 groups are 25, 24, 26, 24 and 26 respectively. Comparison of Dunnett’s single step and step-down procedures results The table below compares the p-values for comparison with Placebo for the two 84.3 Dunnett’s procedure 2029 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data different methods (Dunnett’s single step and step-down) along with the raw p-values. Arm D1 D2 D3 D4 Raw 0.638 0.010 0.007 0.000 Single step 0.896 0.033 0.023 0.001 Step-down 0.638 0.018 0.018 0.001 Notice that the p-values for the step-down procedure are all smaller than the p-values for the single-step procedure except for the Dose4. 84.4 p-value based Procedures The p-value based procedures strongly control the FWER regardless of the joint distribution of the raw p-values as long as the individual raw p-values are legitimate p-values. Assume that there are k arms including the Placebo arm. Let ni be the Pk−1 number of subjects for i-th treatment arm (i = 0, 2, · · · , k − 1). Let N = i=0 ni be the total sample size and the arm 0 refers to Placebo. Let Yij be the response from subject j in treatment arm i and yij be the observed value of Yij (i = 0, 2, · · · , k − 1, j = 1, 2, · · · , ni ). Suppose that Yij = µi + eij (84.1) where eij ∼ N (0, σi2 ). We are interested in the following hypotheses: For the right tailed test: Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0 For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0 For the global null hypothesis at least one of the Hi is rejected in favor of Ki after controlling for FWER. Here Hi and Ki refer to null and alternative hypotheses, respectively, for comparison of i-th arm with the Placebo arm. Let ȳi be the sample mean for treatment arm i, s2i be the sample variance from i-th arm and s2 be the pooled sample variance for all arms. For the equal variance case, one 2030 84.4 p-value Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 need to replace s2i and s20 by the pooled sample variance s2 . For both the case, Ti is distributed as Student’s t distribution. However, the degrees of freedom varies for equal variance and unequal variance case. For equal variance case the degrees of freedom would be N − k. For the unequal variance case, the degrees of freedom is subject to Satterthwaite correction. Let ti be the observed value of Ti and these observed values for K − 1 treatment arms can be ordered as t(1) ≥ t(2) ≥ · · · ≥ t(k−1) . For the right tailed test the marginal p-value for comparing the i-th arm with Placebo is calculated as pi = P (T > ti ) and for left tailed test pi = P (T < ti ), where T is distributed as Student’s t distribution. Let p(1) ≤ p(2) ≤ · · · ≤ p(k−1) be the ordered p-values. For the unequal variance case, the test statistic for comparing treatment effect of arm i with Placebo can be defined as Ti = q 84.5 Single step MC procedures ȳi − ȳ0 1 2 ni si + (84.2) 1 2 n0 s0 East supports three p-value based single step MC procedures: Bonferroni procedure Sidak procedure and Weighted Bonferroni procedure. For the Bonferroni procedure, Hi is rejected if pi < given as min(1, (k − 1)pi ). α k−1 and the adjusted p-value is 1 For the Sidak procedure, Hi is rejected if pi < 1 − (1 − α) k−1 and the adjusted p-value is given as 1 − (1 − pi )k−1 . For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the proportion of α allocated to the Pk−1 1 Hi such that i=1 wi = 1. Note that, if wi = k−1 , then the Bonferroni procedure is reduced to the regular Bonferroni procedure. Example: Bonferroni procedure Dataset: Hypertension-trial.cyd as described in Section 84.2 84.5 Single step MC procedures 2031 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 2032 84.5 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK to analyze the data. The output will be displayed in the main window. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 1, 0.031, 0.045 and 0.001, respectively. Therefore, after multiplicity adjustment according to Bonferroni procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Example: Sidak procedure Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown 84.5 Single step MC procedures 2033 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data below. 4. Click OK to analyze the data. The output will be displayed in the main window once the analysis is over. 2034 84.5 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.982, 0.031, 0.044 and 0.001, respectively. Therefore, after multiplicity adjustment according to Sidak procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Example: Weighted Bonferroni procedure Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown 84.5 Single step MC procedures 2035 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data below. 4. Upon selection of weighted Bonferroni procedure, a table will appear under the drop-down box. The table has two columns - Arm and Proportion of Alpha. In the column Proportion of Alpha, you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default, East distributes the total alpha equally among all tests. Here we have 4 tests in total, therefore each of the tests have proportion of alpha as 1/4 or 0.25. You can specify other proportions as well. For this example, keep the equal proportion of alpha for each test. 5. Click OK to analyze the data. The output will be displayed in the main window 2036 84.5 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.982, 0.031, 0.044 and 0.001, respectively. Therefore, after multiplicity adjustment according to Sidak procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Notice that the adjusted p-values in weighted Bonferroni MC procedure and the simple Bonferroni procedures are identical. This is because the weighted Bonferroni procedure with equal proportion reduces to the simple Bonferroni procedure. 84.5 Single step MC procedures 2037 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data 84.6 Step down MC procedure In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures decision of one hypothesis test can influence the decisions on the other tests of hypotheses. There are two types of stepwise procedures. One type of procedures proceeds in data-driven order. The other type proceeds in a fixed order set a priori. Stepwise tests in a data-driven order can proceed in step-down or step-up manner. East supports Holm step-down MC procedure which start with the most significant comparison and continue as long as tests are significant until the test for certain hypothesis fails. The testing procedure stops at the first time a non-significant comparison occurs and all remaining hypotheses will be retained. In i-th step, H(k−i) is rejected if p(k−i) ≤ αi and go to the next step. Example: Holm’s step-down Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Click OK to analyze the data. The output will be displayed in the main window 2038 84.6 Step down MC procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.023, 0.023 and 0.001, respectively. Therefore, after multiplicity adjustment according to Holm’s step-down procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. 84.7 Data-driven step-up MC procedures Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1 α for j = 1, · · · , i. Fixed i sequence test and fallback test are the types of tests which proceed in a prespecified order. 84.7 Data-driven step-up MC procedures 2039 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data Example: Hochberg’s step-up procedure Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Click OK to analyze the data. The output will be displayed in the main window 2040 84.7 Data-driven step-up MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.022, 0.022 and 0.001, respectively. Therefore, after multiplicity adjustment according to Hochberg’s step-up procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Example: Hommel’s step-up procedure Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown 84.7 Data-driven step-up MC procedures 2041 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data below. 4. Click OK to analyze the data. The output will be displayed in the main window 2042 84.7 Data-driven step-up MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.017, 0.022 and 0.001, respectively. Therefore, after multiplicity adjustment according to Hommel’s step-up procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. 84.8 Fixed-sequence stepwise MC procedures In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both these procedures. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is pre-specified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise 84.8 Fixed-sequence stepwise MC procedures – 84.8.0 Stepwise MC Procedures 2043 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen (2001)). The drawback of fixed sequence test is that once a hypothesis is not rejected no further testing is permitted. This will lead to lower power to reject hypotheses tested later in the sequence. Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Example: Fixed sequence testing procedure Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. Upon selection of Fixed Sequence procedure, a table will appear under the drop-down box. The table has two columns - Arm and Test Sequence. In the column Test Sequence, you have to specify the order in which the hypotheses will be tested. Specify 1 for the arm that will be compared first with Placebo, 2 for the arm that will be compared next and so on. By default East specifies 1 to the first arm, 2 to the second arm and so on. This default order implies that Dose1 will be compared first with Placebo, then Dose2 will be compared followed by comparison of Dose3 vs. Placebo and finally Dose 4 will be compared with Placebo. However, if we believe that efficacy of drug increases with dose, then the dose groups should be compared in descending order of dose. Therefore, specify 4, 3, 2 and 1 in column Test Sequence for D1, 2044 84.8 Fixed-sequence stepwise MC procedures – 84.8.0 Stepwise MC Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 D2, D3 and D4, respectively. This order implies that Dose4 will be compared first with Placebo, then Dose3 will be compared followed by comparison of Dose2 vs. Placebo and finally Dose 1 will be compared with Placebo. Click OK to analyze the data. The output will be displayed in the main window 84.8 Fixed-sequence stepwise MC procedures – 84.8.0 Stepwise MC Procedures 2045 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data once the analysis is over. The input section of the output displays the tests sequence along with the other input values we have provided. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.011, 0.011 and 0.000, respectively. Therefore, after multiplicity adjustment according to fixed sequence procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Example; Fallback procedure Dataset: Hypertension-trial.cyd as described in Section 84.2 Analysis Steps: 2046 84.8 Fixed-sequence stepwise MC procedures – 84.8.0 Stepwise MC Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. Upon selection of Fallback procedure, a table will appear under the drop-down box. The table has three columns - Arm, Proportion of Alpha and Test Sequence. Specify 4, 3, 2 and 1 in column Test Sequence for D1, D2, D3 and D4, respectively. For this example, keep the equal proportion of alpha for each test in the column Proportion of Alpha. 4. Click OK to analyze the data. The output will be displayed in the main window 84.8 Fixed-sequence stepwise MC procedures – 84.8.0 Stepwise MC Procedures 2047 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data once the analysis is over. The input section of the output displays the tests sequence along with the other input values we have provided. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.022, 0.022 and 0.001, respectively. Therefore, after multiplicity adjustment according to fallback procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. 84.9 2048 Example: Raw p-values as input Suppose we don’t have the dataset containing all the observations, rather we have the raw p-values and we want to adjust these using Bonferroni procedure. Here we will consider the 4 raw p-values returned by East using Hypertension-trial.cyd in all the above output. These p-values are 0.634, 0.008, 0.011 and 0.000. We will use these raw 84.9 Example: Raw p-values as input <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 p-values to obtain adjusted p-values. In order to do this, first, we need to create a dataset containing these p-values. Dataset: New Dataset to be created. 84.9 Example: Raw p-values as input 2049 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data Analysis Steps: 1. Choose textbf New > Case Data. This will open a black dataset in the main window. Now right click on the column header and click Create Variable as shown below. 2050 84.9 Example: Raw p-values as input <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. This will bring up the following Variable Type Setting dialog box. 3. Type in Arm for Name and choose the type of variable as String. 84.9 Example: Raw p-values as input 2051 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data 4. Click OK and this will add a column with name Arm in the dataset. Similarly, create a numeric column with label pvalue. Now, enter the values in the table as follows: 2052 84.9 Example: Raw p-values as input <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. East assigns a default name CaseData1 to this dataset. 6. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 7. This will display several input fields associated with multiple comparison test in the main window. In the Main tab, select the radio-button corresponding to raw p-values. In the ensuing two boxes, select Arm as Treatment variable and select pvalue for Select raw p-values. Choose Bonferroni from the drop-down list in Select MCP. 84.9 Example: Raw p-values as input 2053 <<< Contents * Index >>> 84 Analysis-Multiple Comparison Procedures for Continuous Data 8. Click OK. The output will be displayed in the main window. The adjusted p-values for D1, D2, D3 and D4 are 1, 0.032, 0.044 and 0.000, respectively. Note that these adjusted p-values are very close to what we have obtained with Bonferroni procedure using the dataset Hypertension-trial.cyd. Ideally, both set of p-values should exactly match. The difference in p-values is only due to rounding error. 2054 84.9 Example: Raw p-values as input <<< Contents * Index >>> 85 Analysis-Multiple Endpoints for Continuous Data In Chapter 16, we have seen how to evaluate different gatekeeping procedures through intensive simulations. In this chapter, we will illustrate how to analyze a trial with gatekeeping multiple comparison procedures. Consider the Alzheimer’s disease example reported in Reisberg et al. 2003. This study is designed to investigate memantine, an N-methyl-D-aspartate (NMDA) antagonist, for the treatment of Alzheimer’s disease in which patients with moderate-to-severe Alzheimer’s disease were randomly assigned to receive placebo or 20 mg of memantine daily for 28 weeks. The two primary efficacy variables were: (1) the Clinician’s Interview-Based Impression of Change Plus Caregiver Input (CIBIC-Plus) global score at 28 weeks, (2) the change from base line to week 28 in the Alzheimer’s Disease Cooperative Study Activities of Daily Living Inventory modified for severe dementia (ADCS-ADLsev). The CIBIC-Plus measures overall global change relative to base line and is scored on a seven-point scale ranging from 1 (markedly improved) to 7 (markedly worse). The secondary efficacy endpoints included the Severe Impairment Battery and other measures of cognition, function, and behavior. Suppose that the trial is declared successful only if the treatment effect is demonstrated on both endpoints. If the trial is successful, it is of interest to assess the two secondary endpoints: (1) Severe Impairment Battery (SIB), (2) Mini-Mental State Examination (MMSE). The data set is saved in the installation folder of EAST as Alzheimer.csv. To analyze this data set, we need to import the data into EAST by clicking on the Import icon as seen in the following screen. Select the Alzheimer.csv file and click OK to see the data set displayed in EAST. The 2055 <<< Contents 85 * Index >>> Analysis-Multiple Endpoints for Continuous Data following screen shows a snapshot of the data set. Now click on the Analysis menu on the top of EAST window, select Two Samples and 2056 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 then select Multiple Comparisons-Multiple Endpoints from the dropdown list. The main input dialog window pops up as seen in the following screen. 2057 EAST can analyze two types of data: (1) raw subject level data, (2) raw p-values. For <<< Contents 85 * Index >>> Analysis-Multiple Endpoints for Continuous Data the Alzheimer’s disease eample, the data is raw subject level data so we select the left radio button. The left bottom panel of the screen displays all the variables contained in the data set. We need to specify which variable contains the information on treatment group ID for each subject and further specify which one is active treatment group. The next input is to identify all the endpoints to be analyzed. For this example, CIBIC-Plus and ADCS-ADLsev constitute the primary family of endpoints. SIB and MMSE constitutes the secondary family of endpoints. Suppose we need to analyze the data using serial gatekeeping procedure and using Bonferroni to adjust the multiplicity for the two endpoints from the secondary family. After filling in all inputs, the screen looks as follows Now click on OK button on the right bottom of the screen to run the analysis. The 2058 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following screen displays the detailed output of this analysis. The first table shows the summary statistics for each endpoint including mean for each treatment group, estimate of treatment effect, standard error of the effect estimate, test statistic and marginal two-sided confidence interval. The second table shows the inference summary including raw p-values, multiplicity adjusted p-values with the gatekeeping procedure and significance status. It also shows whether the primary family is passed as the serial gatekeeper for the secondary family of endpoints. 2059 <<< Contents * Index >>> 86 Analysis-Binomial Superiority One-Sample This chapter demonstrates how East can be used to perform inferences on data collected from a single-sample superiority study when the observations on a binary variable have an unknown probability of success. You need to either test a null hypothesis about the probability, or compute an exact confidence interval for the probability of success. The section also discusses the analysis of paired data on a binary random variable. Chapter 22 deals with the design, simulation and interim monitoring of these types of trials with reference to a single sample test for proportion. East supports both the asymptotic and exact analysis of these tests. These are accessible from the Analysis menu and allow the validation of whether the data supports the null or alternative hypothesis of the study. Analysis of a single mean superiority test is discussed in section 86.1, while McNemar’s test for paired observations is discussed in section 86.2. 86.1 Example: Single Proportion Dataset: Pilot.cydx Data Description In a pilot study of a new drug, 20 patients were treated. The column Response displays the successes and failures after administering the drug. There were 4 responders (successes) and 16 non-responders (failures). Purpose of the Analysis: Consider the null hypothesis: H0 : π = π0 to be tested against a two-sided alternative hypothesis H1 : π 6= π0 or a one-sided alternative hypothesis H1 : π < π0 or H1 : π > π 0 . In this analysis, the hypothesis is tested asymptotically as well as using Exact Inference. We will obtain a 95% confidence interval for the underlying success rate and test the null hypothesis that π = 0.05. We would also like to compute the power of the test for the alternative hypothesis that π = 0.30. Analysis Steps: Asymptotic Test 1. Open the dataset from Samples folder. 2060 86.1 Example: Single Proportion – 86.1.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. Choose the menu item Analysis > (Discrete) One Sample > (Single Arm Design) Single Proportion 3. In the ensuing dialog box (under the Main tab) choose the variables as shown below. To run the Asymptotic test, do not check the Perform Exact Computation checkbox. 86.1 Example: Single Proportion – 86.1.0 Example 2061 <<< Contents 86 * Index >>> Analysis-Binomial Superiority One-Sample 4. Click OK to start the analysis. The output is displayed in the main window. Note that the test statistic is 3.078 with a 1-sided p-value of 0.001. Since the hypothicated proportion under the null hypothesis is 0.05 which is less than the observed proportion of responders in the data, namely 0.2, the tail type considered for one sided alternative hypothesis is G.E. meaning greater than or equal to. The null hypothesis that π = 0.05 is rejected at the 5% significance level. Analysis Steps: Exact Test 1. Click the Analysis Inputs/Outputs tab on the status bar below. 2. Under the Main tab, select variables as shown below. Make sure to check the 2062 86.1 Example: Single Proportion – 86.1.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Perform Exact Computation checkbox. 3. Under the Advanced tab leave the fields By Variable 1 and By Variable 2 blank. Select the Compute Power checkbox, enter the value 0.05 for Alpha and 0.3 for Probability under H1. Keep the default value 0.95 for Confidence Level. 86.1 Example: Single Proportion – 86.1.0 Example 2063 <<< Contents 86 * Index >>> Analysis-Binomial Superiority One-Sample 4. Click OK to start the analysis. The result is displayed in the main window. The exact 95% confidence interval using the Clopper-Pearson method is (0.057, 0.437). Notice that the Blyth-Still-Casella confidence interval is (0.071, 0.411), which is thus about 10% narrower than the Clopper-Pearson confidence interval. The exact 1-sided p-value is 0.016, and so the null hypothesis that π = 0.05 is rejected at the 5% significance level. The power of the test for the Type-I error α = .05, where testing H0 : π = 0.05 against H1 : π = 0.30 at alpha =0.05, is 0.893. 2064 86.1 Example: Single Proportion – 86.2.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 86.2 Example: McNemar’s . Dataset: Vote.cydx Test for Matched Pairs Data Description This data are taken from Siegel and Castellan (1988, page 77). It shows changes in preference for Presidential candidates before and after a television debate. Table 86.1: Preference for Presidential Candidates Preference Before TV Debate Carter Reagan Preference After TV Debate Carter Reagan 28 13 7 27 Analysis Steps: Asymptotic Test 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) One Sample > (Paired Design) McNemar’s 3. In the ensuing dialog box (under the Main tab) choose the variables as shown below. To run the Asymptotic test, do not check the Perform Exact Computation checkbox. 4. Under the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank, keep the Confidence Level as 0.95. 86.2 Example: McNemar’s Test 2065 <<< Contents 86 * Index >>> Analysis-Binomial Superiority One-Sample 5. Click OK to start the analysis. The output is displayed in the main window. The negative sign of the test statistic indicates that of the 20 discordant pairs, more switched preference from Carter to Reagan (13) than those switched preference from Reagan to Carter (7). The 2-sided p-value is 0.18 indicating not a significant change in preference for Presidential candidates before and after the television debate. The 95% confidence interval for difference of proportions based on the data is (−0.197, 0.037). The fact that this interval includes 0 indicate that we are unable to reject the null hypothesis of no difference on the basis of the data. 2066 86.2 Example: McNemar’s Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis Steps: Exact Test 1. Click the Analysis Inputs/Outputs tab on the status bar below. 2. In the ensuing dialog box (under the Main) tab select the Perform Exact Computation checkbox. 3. Click OK to start the analysis. The output is displayed in the main window. 86.2 Example: McNemar’s Test 2067 <<< Contents 86 * Index >>> Analysis-Binomial Superiority One-Sample The exact p-value is 0.263 indicating not a significant change in preference for Presidential candidates before and after the television debate. 2068 86.2 Example: McNemar’s Test <<< Contents * Index >>> 87 Analysis-Binomial Superiority Two-Sample In clinical trials involving binomial endpoint data, the interest lies in investigating if the subjects on treatment arm possess significantly different proportion of some characteristic, such as proportion of patients developing tumor, showing some side effect, requiring special attention etc as against the same on the control arm. Chapter 23 deals with designing of such clinical trials considering difference of proportions, ratio of proportions or odds ratio of proportions of the two populations. This chapter explores how East is used to analyze data from two independent binomial samples generated while conducting a superiority trial. Assume that the data are sampled independently from two binomial populations with response probabilities πt and πc for treatment and control. This comparison is based on difference of response probabilities, ratio of proportions or odds ratio of the two populations. 87.1 Example: Difference of ProportionsAsymptotic Dataset: Clntrt.cydx Data Description: The following 2 × 2 table is obtained from a clinical trial of two treatments with a binary end-point: Outcome Response No Response Drug A 5 5 Drug B 9 1 The Drug B is the treatment whereas Drug A is control. Purpose of the Analysis: The following 2 × 2 table is obtained from a clinical trial of two treatments with a binary end-point: Outcome Response No Response Drug A 5 5 Drug B 9 1 87.1 Example: Difference of Proportions-Asymptotic 2069 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample To test the hypothesis H0 : δ = 0 against a 1-sided alternative hypothesis H1 : δ > 0. For this analysis, consider 1-sided type I error of 0.05. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) Two Samples > (Parallel Design) Difference of Proportions 3. In the ensuing dialog box (under the Main) tab select Superiority as the Trial Type. Choose other variables as shown below. Do not check Perform Exact Computation checkbox. 2070 87.1 Example: Difference of Proportions-Asymptotic <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK to start the analysis. The output is displayed in the main window. The observed value of test statistic is 1.952. The p-value for 2-sided test is 0.051. The p-values for 2-sided test and for the right tailed test are 0.051 and 0.025 respectively. This p-value is associated with the rejection of H0 : δ = 0 in favor of the alternative hypothesis H1 : δ > 0. East displays the p-value associated with right tailed test on this occasion because δ̂ > 0. The 2-sided 95% confidence interval is (-0.002, 0.699). 87.1 Example: Difference of Proportions-Asymptotic 2071 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample The p-value as well as the confidence interval indicates the rejection of null hypothesis and superiority of the drug over Control. 87.2 Example: Difference of ProportionsExact Dataset: Clntrt.cydx as described in Section 87.1 Purpose of the Analysis: The following 2 × 2 table is obtained from a clinical trial of two treatments with a binary end-point: Outcome Response No Response Drug A 5 5 Drug B 9 1 The drug B is the treatment where as drug A is control. To test the hypothesis H0 : δ = 0 against a 1-sided alternative hypothesis H1 : δ > 0. For this analysis, consider 1-sided type I error of 0.05. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) Two Samples > (Parallel Design) Difference of Proportions 3. In the ensuing dialog box (under the Main) tab select Superiority as the Trial Type. Choose other variables as shown below. Check Perform Exact 2072 87.2 Example: Difference of Proportions-Exact <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Computation checkbox. 4. Click OK to start the analysis. The output is displayed in the main window. 87.2 Example: Difference of Proportions-Exact 2073 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample The one-sided p-value as well as the confidence interval indicates the rejection of null hypothesis and superiority of the Treatment over Control. 87.3 Example: Ratio of ProportionsAsymptotic Dataset: Clntrt.cydx as described in Section 87.1. Purpose of the Analysis: In the Ratio of Proportions test, let πt and πc denote the proportions of the successes from the experimental treatment (T) and the control treatment (C), respectively. To test the null hypothesis H0 : πt /πc = 1 against the 2-sided alternative hypothesis H1 : πt /πc 6= 1 or a 1-sided alternative hypothesis H1 : πt /πc < 1 or H1 : πt /πc > 1. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) Two Samples > (Parallel Design) Ratio of Proportions 3. In the ensuing dialog box (under the Main) tab select Superiority as the Trial Type. Choose other variables as shown below. Do not check Perform Exact 2074 87.3 Example: Ratio of Proportions-Asymptotic <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Computation checkbox. 4. Click OK to start the analysis. Upon completion of the analysis, the output is displayed in the main window. 87.3 Example: Ratio of Proportions-Asymptotic 2075 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample The observed value of test statistic is 1.952 with a 1-sided p-value equal to 0.025. The 2-sided 95% confidence interval for πt /πc is (0.997, 3.873). The null hypothesis is rejected establishing the superiority of the Treatment over Control. 87.4 Example: Ratio of Proportions-Exact Dataset: Clntrt.cydx as described in Section 87.1. Purpose of the Analysis: In the Ratio of Proportions test, let πt and πc denote the proportions of the successes from the experimental treatment (T) and the control treatment (C), respectively. To test the null hypothesis H0 : πt /πc = 1 against the 2-sided alternative hypothesis H1 : πt /πc 6= 1 or a 1-sided alternative hypothesis H1 : πt /πc < 1 or H1 : πt /πc > 1. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) Two Samples > (Parallel Design) Ratio of Proportions 3. In the ensuing dialog box (under the Main) tab select Superiority as the Trial Type. Choose other variables as shown below. Check Perform Exact 2076 87.4 Example: Ratio of Proportions-Exact <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Computation checkbox. 4. Click OK to start the analysis. Upon completion of the analysis, the output is displayed in the main window. 87.4 Example: Ratio of Proportions-Exact 2077 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample The one-sided p-value indicates the rejection of null hypothesis and establishes the superiority of drug B over A. 87.5 Example: Odds Ratio of Proportions Dataset: Clntrt.cydx as described in Section 87.1. Purpose of the Analysis: Let πt and πc denote proportion of responses under treatment and control arm respectively. The odds ratio of proportions denoted by Ψ is defined as π (1 − πc ) Ψ= t . πc (1 − πt ) The null hypothesis H0 : Ψ = 1 is to be tested against the 2-sided alternative hypothesis H1 : Ψ 6= 1 or against 1-sided alternative hypotheses H1 : Ψ < 1 or H1 : Ψ > 1. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) Two Samples > (Parallel Design) Odds Ratio of Proportions 3. In the ensuing dialog box (under the Main) tab select Superiority as the Trial 2078 87.5 Example: Odds Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Type. Choose other variables as shown below. 4. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank and keep default value of 0.95 in Confidence Level. 5. Click OK to start the analysis. Upon completion of the analysis, the output is 87.5 Example: Odds Ratio of Proportions 2079 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample displayed in the main window. The output gives estimate of odds ratio and 2-sided p-value using RBG variance and M-H variance. The two sided p values indicate failing to reject the null hypothesis. 2080 87.5 Example: Odds Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 87.6 Example: Common Odds Ratio of Proportions for stratifies 2X2 tables Dataset: BD.cydx Data Description The data below for six age groups, relating alcohol to oesophageal cancer, are taken from Breslow and Day (1980). Age Group 25-34 35-44 45-54 55-64 65-74 75+ Alcohol Exposure Case Control 1 9 4 26 25 29 42 27 19 18 5 0 No Exposure Case Control 0 106 5 164 21 138 34 139 36 88 8 31 Purpose of the Analysis: The Homogeneity test is executed on these data to determine if the Odds-Ratios across the six age groups are constant. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) Two Samples > (Parallel Design) Common Odds Ratio for Stratifies 2X2 Tables 3. In the ensuing dialog box (under the Main) tab choose the variables as shown 87.6 Example 2081 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample below. 2082 87.6 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK to start the analysis. The output is displayed in the main window. The output gives observed odds ratios across strata, Breslow and Day statistic with and without Tarone’s correction and 2-sided p-value using RBG variance and M-H variance. Note that the two sided p values for both Breslow and Day (1980) statistic and with Tarone’s correction are greater than 0.05 thereby accepting the null 87.6 Example 2083 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample hypothesis of common odds ratio across the strata. However, you see a warning in the output. East computes 95% confidence intervals for the exact p value and checks if the asymptotic p value lies in the interval. In case, it doesn’t, East gives the warning message that the asymptotic inference would be unreliable. Having accepted the hypothesis of common odds ratio across all strata, the Mantel-Haenszel inference tests the hypothesis that this common odds ratio is equal to 1. Both the p values using RBG variance and MH variance are very close to zero indicating rejection of the null hypothesis that the common odds ratio is equal to 1. 87.7 Example: Fisher’s Exact Test Dataset: Clntrt.cydx as described in Section 87.1. Purpose of the Analysis: As in the Difference of Proportions test, suppose πt and πc denote the proportions of the successes from the experimental treatment (T) and the control treatment (C). To test the null hypothesis: H0 : πt = πc , (87.1) against 1-sided alternatives of the form, H1 : πt > πc , (87.2) H10 : πt < πc , (87.3) or and against 2-sided alternatives of the form H2 : πt 6= πc . Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item Analysis > (Discrete) Two Samples > (Parallel Design) Fisher’s Exact 2084 87.7 Example: Fisher’s Exact Test (87.4) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. In the ensuing dialog box (under the Main) tab choose the variables as shown below. 87.7 Example: Fisher’s Exact Test 2085 <<< Contents 87 * Index >>> Analysis-Binomial Superiority Two-Sample 4. Click OK to start the analysis. The output is displayed in the main window. The above output provides Fisher statistic, and the 2-sided asymptotic p-value as the tail area to the right of the observed Fisher statistic from a chi-square distribution with 1 df as shown in the equation. It is 0.058. The asymptotic 1-sided p-value is defined to be half the corresponding 2-sided p-value, or 0.0292. The bottom portion of the screen provides exact 1 and 2-sided p-values. The exact 2-sided p-value, 0.141 with D(Y) as the Fisher statistic. This is considerably larger than the asymptotic p-value, 2086 87.7 Example: Fisher’s Exact Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 highlighting the unreliability of asymptotic inference for small datasets. The output screen shows that the 1-sided p-value is obtained as the tail area to the left of 5 from the distribution of y11 . The magnitude of the p-value is 0.07. The one sided exact p value can be obtained from the exact distribution of y11 , the entry in row 1 and column 1 of the 2 × 2 table. 87.7 Example: Fisher’s Exact Test 2087 <<< Contents * Index >>> 88 Analysis-Binomial Noninferiority Two-Sample In a binomial noninferiority trial the goal is to establish that the response rate of an experimental treatment is no worse than that of an active control, rather than attempting to establish that it is superior. A therapy that is demonstrated to be noninferior to the current standard therapy for a particular indication might be an acceptable alternative if, for instance, it is easier to administer, cheaper, or less toxic. Such noninferiority trials are designed by specifying a noninferiority margin. The amount by which the response rate on the experimental arm is worse than the response rate on the control arm must fall within this margin in order for the claim of noninferiority to be sustained. Chapter 18 deals with the designing of such clinical trials considering difference of proportions, ratio of proportions or odds ratio of proportions of the two populations. This chapter demonstrates how East is used to analyze the data from two independent binomial samples generated while conducting a noninferiority trial. We shall assume that the data is sampled independently from two binomial populations with response probabilities πt and πc for treatment and control. This comparison is based on difference of proportions, ratio of proportions or odds ratio of the two populations. For difference and ratio of proportions, we follow two formulations, namely Wald’s (1940) and Farrington and Manning’s (1990) score. 88.1 Example: NoninferiorityDataset: Nephrodash.cyd. -Diff. of Proportions - Asymptotic Data Description The data is for childhood nephroblastoma. Details of the data are as given below: Response Rupture-free Ruptured tumor Total Chemo (New) 83 5 88 Radio (Standard) 80 7 87 Total 163 12 175 The dataset has three variables Resp, PopID and Freq. A value of 1 in Resp represents response and 0 as non-response. In PopID, 0 is control and 1 is treatment. 2088 88.1 Example: Noninferiority -Diff. of Proportions - Asymptotic <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Purpose of the Analysis: The standard treatment for this disease is nephrectomy followed by post-operative radiotherapy. Whereas the experimental treatment is pre-operative chemotherapy to reduce the tumor mass, followed by nephrectomy. First perform superiority test to see if the experimental treatment is superior to the standard therapy. For this analysis, consider 1-sided type I error of 0.05. This will be followed by a noninferiority test with a noninferiority margin of 0.1 Analysis Steps: For Superiority Test 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Two Samples > (Parallel Design) Difference of Proportions 3. In the Main tab, select variables as shown below. Do not check Perform Exact Computation checkbox. 88.1 Example: Noninferiority -Diff. of Proportions - Asymptotic 2089 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample 4. Click OK to start the analysis. The output is displayed in the main window. Note that the p-value for one-sided test is 0.268. Clearly there is no evidence that the chemotherapy arm is superior to radiotherapy. However, the goal of this study was different. The investigators only wished to establish the noninferiority of chemotherapy relative to radiotherapy at a noninferiority margin of 10%. In other words, the chemotherapy arm is considered to be non-inferior to the radiotherapy arm if the probability of being rupture free following the surgery is at most 10% lower for the chemotherapy arm than for the radiotherapy arm. Analysis Steps: For Noninferiority Test 1. Click Analysis Inputs tab on the status bar below. This will open recent inputs you gave for superiority in the main window. In the Main tab, change the trial 2090 88.1 Example: Noninferiority -Diff. of Proportions - Asymptotic <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 type to Noninferiority. Input the value of Noninferiority margin as 0.1. Click Wald in Test Type. Here also, do not check Perform Exact Computation checkbox. 2. Click OK to display following output in the main window. 88.1 Example: Noninferiority -Diff. of Proportions - Asymptotic 2091 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample Note the 1-sided p-value is now 0.023. This p-value is associated with the rejection of H0 : δ ≤ 0 in favor of the alternative hypothesis H1 : δ > 0. East displays the p-value associated with right tailed test on this occasion because δ̂ > 0. The 2-sided 95% confidence interval is (-1, 0.086).The p-value as well as the confidence interval indicate the rejection of null hypothesis and Noninferiority of chemotherapy over radiotherapy. In the Main tab, if you select Score in Test Type, the following output is displayed: 2092 88.1 Example: Noninferiority -Diff. of Proportions - Asymptotic <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In this case, the 1-sided p-value is 0.036 establishing Noninferiority. 88.2 Example: Diff. of Proportions - Exact Dataset: Nephrodash.cyd as described in Section 88.1. Purpose of the Analysis: The standard treatment for this disease is nephrectomy followed by post-operative radiotherapy. Whereas the experimental treatment is pre-operative chemotherapy to reduce the tumor mass, followed by nephrectomy. First perform superiority test to see if the experimental treatment is superior to the standard therapy. For this analysis, consider 1-sided type I error of 0.05. This will be followed by a noninferiority test type with a noninferiority margin of 0.1 Analysis Steps: For Superiority Test 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Two Samples > (Parallel Design) Difference of Proportions 3. In the Main tab, select variables as shown below. Check Perform Exact 88.2 Example: Diff. of Proportions - Exact 2093 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample Computation checkbox. 4. Click OK to start the analysis. The output is displayed in the main window. 2094 88.2 Example: Diff. of Proportions - Exact <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that the p-value for one-sided test is 0.289. Clearly there is no evidence that the chemotherapy arm is superior to radiotherapy. However, the goal of this study was different. The investigators only wished to establish the noninferiority of chemotherapy relative to radiotherapy at a noninferiority margin of 10%. In other words, the chemotherapy arm is considered to be non-inferior to the radiotherapy arm if the probability of being rupture free following the surgery is at most 10% lower for the chemotherapy arm than for the radiotherapy arm. 88.2 Example: Diff. of Proportions - Exact 2095 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample Analysis Steps: For Noninferiority Test 1. Click Analysis Inputs tab on the status bar below. This will open recent inputs you gave for superiority in the main window. In the Main tab, change the trial type to Noninferiority. Input the value of Noninferiority margin as 0.1. Click Score in Test Type. Check Perform Exact Computation checkbox. 2. Click OK to display following output in the main window. In exact computations, the p-value is 0.037 indicating the significance. This concludes that Chemotherapy is noninferior to Radiotherapy. 2096 88.2 Example: Diff. of Proportions - Exact <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 88.3 Ratio of Proportions Example: Ratio of Proportions Asymptotic Example: Ratio of Proportions - Exact As before, let πt and πc denote the proportions of the successes from the experimental treatment (T) and the control treatment (C), respectively. To test the null hypothesis, we transform the original data using log and perform difference of proportions test. Example: Ratio of Proportions - Asymptotic Dataset: Vaccine.cydx. Data Description Chan (1998) discusses a vaccine efficacy study of a recombinant DNA Influenza A vaccine against wild-type H1N1 virus challenge. The study compares the infection rates in the vaccinated and placebo groups. There were 15 individuals in each group. The following data was obtained. Disease Status Infected Not Infected Total Treatment Group Placebo Vaccine 12 (80%) 7 (47%) 3 (20%) 8 (53%) 15 15 Total 19 11 30 Purpose of the Analysis: Let πt be the infection rate in the vaccinated group and πc be the infection rate in the placebo group. Define ρ = πt /πc , and define λ = 1 − ρ. The parameter λ is known as the vaccine efficacy. Assume that πt ≤ πc . Therefore the new vaccine has 100% efficacy if πt = 0 and no efficacy if πt = πc . From a public health standpoint, the benefits from vaccination must exceed a given threshold in order to justify the risk of vaccinating healthy subjects. Therefore, in designing vaccine trials, one typically chooses a non-zero efficacy lower bound. Suppose we choose λ0 = 0.1 as the non-zero efficacy lower bound. This implies that if λ ≤ 0.1, the virus does not offer sufficient benefit relative to placebo to justify using it on a large scale for the prevention of infection. Thus we wish to test the null hypothesis of insufficient vaccine efficacy (i.e., inferiority) λ ≤ 0.1 against the 1-sided alternative hypothesis of sufficient vaccine efficacy (i.e., noninferiority), λ > 0.1. Equivalently, we wish to test the null hypothesis of inferiority, H0 : ρ ≥ 0.9, 88.3 Ratio of Proportions – Example: Ratio of Proportions - Asymptotic (88.1) 2097 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample against the alternative hypothesis of noninferiority. H1 : ρ < 0.9. (88.2) Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Two Samples > (Parallel Design) Ratio of Proportions 3. In the Main tab, select Noninferiority as Trial Type. Select all other variables as shown below. Do not check Perform Exact Computation checkbox. 4. In the Advanced tab, leave the By Variable 1 and By Variable 2 blank and keep default value of 0.95 in Confidence level. 2098 88.3 Ratio of Proportions – Example: Ratio of Proportions - Asymptotic <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. Click OK to start the analysis. The output is displayed in the main window. The observed value of test statistic is −1.423 with a 1-sided p-value equal to 0.923. The 1-sided 95% confidence interval for πt /πc is (0.353, Inf inity). The p-value indicates that the null hypothesis of insufficient vaccine efficacy cannot be rejected. The corresponding 95% lower confidence bound for ρ is 0.353, which confirms that we cannot rule out the possibility that ρ ≤ 0.9. 88.3 Ratio of Proportions – Example: Ratio of Proportions - Asymptotic 2099 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample If you select Score (Farrington Manning) in Test Type, the following output is displayed: Since the p-value is 0.936, the noninferiority can not be established. Example: Ratio of Proportions - Exact Dataset: Vaccine.cydx as described in Section 88.3. 2100 88.3 Ratio of Proportions – Example: Ratio of Proportions - Exact <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Purpose of the Analysis: To test the null hypothesis of inferiority, H0 : ρ ≥ 0.9, (88.3) against the alternative hypothesis of noninferiority. H1 : ρ < 0.9. (88.4) Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Two Samples > (Parallel Design) Ratio of Proportions 3. In the Main tab, select Noninferiority as Trial Type. Also make sure to check Perform Exact Computationcheckbox. 4. In the Advanced tab, leave the By Variable 1 and By Variable 2 blank and keep default value of 0.95 in Confidence level. 88.3 Ratio of Proportions – Example: Ratio of Proportions - Exact 2101 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample 5. Click OK to start the analysis. The output is displayed in the main window. The p-value is 0.086 suggesting non-significance, however the value is drastically reduced from the corresponding asymptotic p-value. 2102 88.3 Ratio of Proportions – Example: Ratio of Proportions - Exact <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 88.4 Example: Odds Ratio of Proportion Dataset: Vaccine.cydx as described in Section 88.3. Purpose of the Analysis: Use the same data to demonstrate the testing of Noninferiority of Odds Ratio in case of two independent binomial samples. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Two Samples > (Parallel Design) Odds Ratio of Proportions 3. In the Main tab, select Noninferiority as Trial Type. Select all other variables as shown below. 4. In the Advanced tab, leave the By Variable 1 and By Variable 2 blank and keep default value of 0.95 in Confidence level. 88.4 Example: Odds Ratio of Proportion 2103 <<< Contents 88 * Index >>> Analysis-Binomial Noninferiority Two-Sample 5. Click OK to start the analysis. The output is displayed in the main window. The output gives Test Statistic value as −2.154 with a 1-sided p- value equal to 0.016. As a result, the vaccination can be considered noninferior to the control. 2104 88.4 Example: Odds Ratio of Proportion <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 If you select Score as Test Type, the following output is displayed: 88.4 Example: Odds Ratio of Proportion 2105 <<< Contents * Index >>> 89 89.1 Equivalence: Difference of Proportions Analysis-Binomial Equivalence Two-Samples This test arises when the difference is in establishing the bioequivalence of a new compound with an established compound. It is the 2-sided version of the noninferiority test for difference of proportions. Thus if πc and πt are the response rates of control and treatment, respectively, then the goal is to test the null hypothesis of inequivalence, |πt − πc | ≥ δ0 , against 2-sided alternative hypothesis of equivalence, |πt − πc | < δ0 , for a pre-specified equivalence margin δ0 > 0. We test the above null hypothesis by performing two separate one-sided non-inferiority hypothesis tests of the form H01 : πc − πt ≥ δ0 versus H11 : πc − πt < δ0 (89.1) H02 : πt − πc ≥ δ0 versus H12 : πt − πc < δ0 . (89.2) and Each hypothesis test is carried out separately. Hypothesis test H01 is performed under the assumption that πc − πt is at its threshold null value πc − πt = δ0 . Similarly hypothesis test H02 is tested under the assumption that πt − πc is at its threshold null value πt − πc = δ0 .We reject the null hypothesis of inequivalence and accept the alternative hypothesis of equivalence only if both H01 and H02 are rejected. 89.2 Example: Equivalence: Dataset: Nephrodash.cyd as described in Section 88.1. Difference of ProportionsAnalysis Steps Asymptotic 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Two Samples > (Parallel Design) Difference of Proportions 3. In the Main tab, select variables as shown below. Do not check Exact 2106 89.2 Example: Equivalence: Difference of Proportions-Asymptotic <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Computation checkbox. 4. Click OK to start the analysis. The output is displayed in the main window. 89.2 Example: Equivalence: Difference of Proportions-Asymptotic 2107 <<< Contents 89 * Index >>> Analysis-Binomial Equivalence Two-Samples The output gives Test Statistic values as −2.801 and −1.801 with 1-sided p-values equal to 0.003 and 0.036, respectively. The null hypothesis of inequivalence can be rejected only if both the noninferiority null hypotheses are rejected. Each noninferiority hypothesis is typically tested at the 2.5% level of signifcance since each test is 1-sided. In the present example a statistically significant p-value (p = 0.003) is obtained for the H01 non-inferiority tests and a non-significant p-value (p = 0.036) is obtained for the H02 non-inferiority test. Therefore we cannot reject the null hypothesis of inequivalence. 89.3 Example: Equivalence: Difference of Dataset: Nephrodash.cyd as described in Section 88.1. Proportions-Exact Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Two Samples > (Parallel Design) Difference of Proportions 3. In the Main tab, select variables as shown below. Make sure to check Exact 2108 89.3 Example: Equivalence: Difference of Proportions-Exact <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Computation checkbox. 4. Click OK to start the analysis. The output is displayed in the main window. 89.3 Example: Equivalence: Difference of Proportions-Exact 2109 <<< Contents 89 * Index >>> Analysis-Binomial Equivalence Two-Samples In this example, a statistically signi.cant p-value (p = 0.002) is obtained for the H01 non-inferiority tests and a non-significant p-value (p = 0.037) is obtained for the H02 non-inferiority test. Therefore we can not reject the null hypothesis of inequivalence. 2110 89.3 Example: Equivalence: Difference of Proportions-Exact <<< Contents * Index >>> 90 Analysis-Discrete: Many Proportions In clinical trials involving categorical endpoints, there are several situations where either the data are coming from many binomial populations or the responses are from multinomial distribution. In case of multiple binomial populations, the interest lies in testing whether the success probability differs across several binomial populations, in particular does it increase or decrease with reference to an index variable. For data coming from multinomial distributions, one is interested in testing if the cell probabilities are according to some theoretical law. East can be used to analyze both these types of data. In this chapter we will demonstrate how the tests on many proportions can be executed in East. 90.1 Example: Chisquare Test of Specified Proportions Dataset: Smallt.cydx Data Description The dataset has four variables Category, Freq, Prob and ExpFreq. The Category variable has four categories. Freq is the observed frequency for these four categories, and the variable prob represents expected probabilities for these categories. Table 90.1 shows the observed counts and the multinomial probabilities under the null hypothesis for a multinomial distribution with four categories. Table 90.1: Frequency Counts from a Multinomial with 4 Categories Cell Counts Cell Probabilities Multinomial Categories 1 2 3 4 7 1 1 1 0.3 0.3 0.3 0.1 Row Total 10 1 Purpose of the Analysis: To test whether the observed cell counts are according to the specified Cell probabilities. Analysis Steps: Based on expected probabilities 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Many Samples > (Single Arm Design) Chi-Square for 90.1 Example: Chi-square Test of Specified Proportions 2111 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions Specified Proportions in C categories This will display several input fields associated with the Chi-square Test in the main window. 3. In the Main tab, select Category in Category and Freq in the Observed Frequency variable. Since the data consist of expected probabilities, select the Probability option and select variable prob in Probability. 2112 90.1 Example: Chi-square Test of Specified Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK to start the analysis. The output is displayed in the main window. Note that the output contains estimation of multinomial probabilities as well as the confidence intervals for these based on the observed data. The observed value of chi-square test statistic with degrees of freedom 3 is 8. The 2-sided p-value is 0.046. This p-value is associated with the rejection of H0 : πi = π0i , i = 1, 2, 3, ..., C in favor of the alternative hypothesis of not following the multinomial distribution with specified proportions. Analysis Steps: Based on expected frequencies The test can also be run if the data contains expected frequencies rather than expected probabilities for the categories. 1. Click Analysis Input/output tab on the Status bar below. 2. In the main tab, select the Expected Count option instead of Probability and choose ExpFreq in Expected Frequency. 90.1 Example: Chi-square Test of Specified Proportions 2113 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions 3. Click OK and the following output is displayed. As before, the output shows estimates of multinomial probabilities and the asymptotic inference. Since the expected counts in the data were consistent with the probabilities, the inference is the same. 90.2 Example: Two group Chi-square test Dataset: vari.cydx as described in Section 74.4.1. Purpose of the Analysis: To test if the two groups specified in row and column are independent of each other. Analysis Steps 1. Open the dataset from Samples folder. 2114 90.2 Example: Two group Chi-square test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. Choose the menu item: Analysis > (Discrete) Many Samples > (Single Arm Design) Two Group Chi-Square for Proportions in C categories This will display several input fields associated with the Chi-square Test in the main window. 3. In the Main tab, select Group in Row (Group), Category in Column(Categories), and Freq in the Frequency Variable. 90.2 Example: Two group Chi-square test 2115 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions 4. Click OK to start the analysis. The output is displayed in the main window. Note that the output contains inference using chi-square test, likelihood ratio test and several measures of association such as Phi, Pearson’s contingency coefficient, Sakoda’s as well as Tshuprov coefficient, Cramer’s V, and Uncertainty coefficient etc. It also displays a warning in case the asymptotic p-value does not belong to the 99% CI for exact p-value. The observed value of chi-square test statistic with degrees of freedom 3 is 6.255. The 2-sided p-value is 0.1. Accordingly, there is not enough evidence for rejecting the null hypothesis. Therefore, we cannot conclude that Interferon is more effective than placebo in preventing adverse effects. 2116 90.2 Example: Two group Chi-square test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 90.3 Example: Wilcoxon Rank Sum Test for Ordered Categories Data The Wilcoxon-Rank-Sum test (Lehmann, 1975) is one of the most popular nonparametric tests for detecting a shift in location between two populations. It can accommodate either continuous or ordinal categorical data. It has an asymptotic relative efficiency of 95.5%, relative to the t test when the underlying distributions are normal. The Wilcoxon rank sum test is used for comparing two populations that generate either continuous or ordinal categorical responses. The Wilcoxon rank sum statistic is defined by equation R.234. Dataset: vari.cydx as described in Section 74.4.1. Purpose of the Analysis: To test that two populations, each generating an ordered categorical response, have the same underlying multinomial distribution for the response variable. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Many Samples > (Single Arm Design) Wilcoxon Rank Sum for Ordered Categorical Data This will display several input fields associated with the Chi-square Test in the main window. 3. In the Main tab, select Group in Row(Population), Category in Column(Response), and Freq in the variable Frequency Variable. 90.3 Wilcoxon Rank Sum Test 2117 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions 4. Click OK to start the analysis. The output is displayed in the main window. Note that the output contains asymptotic inference for Wilcoxon Rank Sum Statistic as well as estimation of odds ratios for the categories with the corresponding confidence intervals 90.4 2118 Example: Trend in R ordered proportions Dataset: Korn case data.cydx Data Description Data from a prospective study of maternal drinking and congenital sex organ malformations (Graubard and Korn, 1987). 90.4 Example: Trend in R ordered proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Malformation Absent Present Maternal Alcohol Consumption (drinks/day) 0 <1 1−2 3−5 ≥6 17066 14464 788 126 37 48 38 5 1 1 Purpose of the Analysis: To test if a series of observed proportions all have the same underlying binomial response rate, where the alternative is that these rates are unequal, but ordered in some natural way. In other words, there is a trend in the binomial response rates. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Many Samples > (Multi-Arm Design) Trend in R Ordered Proportions This will display several input fields associated with the Trend in R Ordered Proportions in the main window. 3. In the Main tab, select Column in Binomial Population(Column), Row in Binary Response(Row) with Response Value of 1. Select Weight as the Frequency Variable. 90.4 Example: Trend in R ordered proportions 2119 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions 4. Click OK to start the analysis. The output is displayed in the main window. 2120 90.4 Example: Trend in R ordered proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that the output contains asymptotic inference for Cochran-Armitage Trend Test as well as estimation of odds ratios for the categories with the corresponding confidence intervals. The 2-sided p-value namely, 0.176 indicates that we are unable to reject the null hypothesis of no trend in the proportions across the categories. 90.4 Example: Trend in R ordered proportions 2121 <<< Contents 90 90.5 * Index >>> Analysis-Discrete: Many Proportions Example: ChiSquare Test for R × 2 Proportions Dataset: Fda1.cydx Data Description We are grateful to Dr. Mirza W. Ali of the Food and Drug Administration (FDA) for providing this data set. Animals were treated with four dose levels of a carcinogen and then observed (at necropsy) for the presence or absence of a tumor type. The data were stratified by survival time (in weeks) into the four time intervals 0–50, 51–80, 81–104, and terminal sacrifice. Since there were, no tumors found in the first time interval, this stratum may be excluded from data entry. The data for the remaining three strata are given below. We will use the stratum variable as a By variable. Stratum 1: 51–80 weeks of survival Dose of Carcinogen Disease Status None 1 unit 5 units 50 units Tumor Present 0 0 0 1 Tumor Absent 7 10 6 8 Total 1 31 Stratum 2: 81–104 weeks of survival Dose of Carcinogen Disease Status None 1 unit 5 units 50 units Tumor Present 0 1 0 1 Tumor Absent 11 9 13 14 Total 2 47 Stratum 3: Sacrificed at end of 104 weeks Dose of Carcinogen Disease Status None 1 unit 5 units 50 units Tumor Present 1 1 1 2 Tumor Absent 29 26 28 20 Total 5 103 Purpose of the Analysis: To test if the data come from binomial distributions having same probability of response. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Many Samples > (Multi-Arm Design) Chi-square 2122 90.5 Example: Chi-Square Test for R × 2 Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Test for Rx2 Proportions This will display several input fields associated with the Chi-square Test for Rx2 Proportions in the main window. 3. In the Main tab, select all variables as shown below. 4. In the Advanced tab, select By Variable 1 as Stratum. 5. Click OK to start analysis. The output is displayed in the main window as 90.5 Example: Chi-Square Test for R × 2 Proportions 2123 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions shown below: 2124 90.5 Example: Chi-Square Test for R × 2 Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that the output contains asymptotic inference for Chi-square test for Rx2 90.5 Example: Chi-Square Test for R × 2 Proportions 2125 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions proportions and Likelihood ratio test as well as the concordance coefficients for each of the value of stratum. All the 2-sided p-values are greater than 0.05 showing no evidence to reject the null hypothesis. This is true for all strata. 90.6 2126 Example: Chisquare Test for Prop in RxC Tables Dataset: Oral.cydx 90.6 Example: Chi-square Test for Prop in RxC Tables <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Data Description data is obtained on the location of oral lesions, in house to house surveys in three geographic regions of rural India. These data are displayed here in the form of 9 × 3 contingency table (Table 90.2) in which the counts are the number of patients with oral lesions at that site, in that geographic region. Table 90.2: Oral Lesions Data Set Site of Lesion Labial Mucosa Buccal Mucosa Commissure Gingiva Hard Palate Soft Palate Tongue Floor of Mouth Alveolar Ridge Kerala 0 8 0 0 0 0 0 1 1 Gujarat 1 1 1 1 1 1 1 0 0 Andhra 0 8 0 0 0 0 0 1 1 Purpose of the Analysis: To test if the distribution of the site of the oral lesion is significantly different in the three geographic regions. Analysis Steps 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Many Samples > (Multi-Arm Design) Chi-square for Proportions in RxC Tables This will display several input fields associated with the Chi-square test for RxC proportions in the main window. 90.6 Example: Chi-square Test for Prop in RxC Tables 2127 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions 3. In the Main tab select all variables as shown below. 2128 90.6 Example: Chi-square Test for Prop in RxC Tables <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK to start the analysis. The output is displayed in the main window. 90.6 Example: Chi-square Test for Prop in RxC Tables 2129 <<< Contents 90 * Index >>> Analysis-Discrete: Many Proportions Note that the output contains asymptotic inference for Chi-square test for RxC proportions and Likelihood ratio test as well as the concordance coefficients for each of the value of stratum. The 2-sided p-value is 0.14 and 0.106 for chi-square and likelihood ratio tests, we are unable to reject H0 . Note that in addition to the inference a warning is displayed as ’Warning: Since the asymptotic p-value is not contained in the 99% CI for exact p-value, the asymptotic outcomes may be considered unreliable.’ The sparseness of data is causing this problem. We recommend you to refer to the StatXact software for further details. 2130 90.6 Example: Chi-square Test for Prop in RxC Tables <<< Contents * Index >>> 91 Analysis-Binary Regression Analysis In this chapter we focus on how to run binary regression analysis in East. East provides logistic, probit, and complementary log-log regression models for data with a binary response variable. Along with regular maximum likelihood inference for logistic model, East provides Firth bias-correction for asymptotic estimates for unstratified logistic regression. Profile likelihood based confidence intervals for estimates are available for unstratified data. Section 91.1 describes the Logistic Regression model for binary data and how East can be used to analyze data. Section 91.3 describes the Firth Procedure. Section 91.4 describes Profile Likelihood Based Confidence Intervals. Section 91.5 describes the Probit Model for Binary Data and Section 91.6 discusses the complementary Log-log Model which is also for binary data. 91.1 Logistic Regression Example: Logistic Regression Consider a set of independent binary random variables, Y1 , Y2 , . . . Yn . Corresponding to each random variable, Yj , there is a (p × 1) vector xj = (x1j , x2j , . . . xpj )0 of explanatory variables (or covariates). Let πj be the probability that Yj = 1. Logistic regression models the dependency of πj on xj through the relationship πj log = γ + x0j β , (91.1) 1 − πj where γ and β ≡ (β1 , β2 , . . . βp )0 are unknown parameters. We usually refer to γ as the constant term. In this section, we demonstrate how East can be used to perform binary logistic regression analysis. Additionally, the asymptotic bias corrected estimates (Firth (1993)) and confidence intervals of the estimates using profile likelihood method (Venzon and Moolgavkar (1988)) based on the normal score function and the penalized score function are also available using East. In addition to fitting the regression coefficients, East can also be used to: Perform significance testing of regression coefficients using Wald test Perform 1st order autocorrelation in residuals using Durbin-Watson test Compute collinearity diagnostics Compute different types of residuals Compute Influential statistics Compute predicted values Perform variable selection 91.1 Logistic Regression – Example: Logistic Regression 2131 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis Example: Logistic Regression Dataset: LogisticData.cydx Data Description This data has been provided by Dr. S. Lai, University of Miami for a hospital based prospective study of perinatal infection and human immunodeficiency virus (HIV-1). Hutto, Parks, Lai, et al. (1991) investigated the possibility that the CD4 and CD8 blood serum levels measured in infants at 6 months of age might be good predictors of eventual HIV infection. In the dataset, CD4 and CD8 assume the values 0, 1, 2. However, these are not the actual blood serum levels. Rather they are coded surrogates for them. The data on HIV infection rates and blood serum levels are tabulated below: Proportion Developing HIV 4/7 (57%) 1/1 (100%) 2/7 (29%) 4/12 (33%) 2/2 (100%) 0/2 (0%) 0/13 (0%) 1/3 (33%) Serum Levels at 6 Months CD4 CD8 0 0 0 2 1 0 1 1 1 2 2 0 2 1 2 2 Purpose of the Analysis: We want to fit a Logistic model using the model terms, CD4 and CD8. To specify the Logistic model HIV = CD4+CD8 to the data. Analysis Steps: Regression based on Logistic Estimate 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Regression > (Parallel Design) Logistic Regression 3. In the Main tab, select HIV as the Response variable with Response Value 1, and Freq as the Weightage variable. Also notice that %Const is shown as a 2132 91.1 Logistic Regression – Example: Logistic Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 model term. This is because East, by default always fits a model which includes the constant term unless you clear the “Include Intercept Term” check box. To specify an appropriate model, we would define HIV response rate as a function of CD4 and CD8, both covariates being regarded as ordinal. In the Variables box, select CD4 and CD8 and click button to include these terms under the Model Terms. Leave the default option as Estimate. 4. Click OK to estimate the regression coefficients. The maximum likelihood estimates, p-values, and confidence intervals for the regression parameters are 91.1 Logistic Regression – Example: Logistic Regression 2133 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis computed and displayed in the main window. The third section is Summary Statistics. This section displays the deviance and its degrees of freedom, and the likelihood ratio statistic and degrees of freedom for testing the null hypothesis that the response probability of each observation is 0.5, i.e., all the model parameters, including the constant term, are simultaneously 0. The likelihood ratio statistic may be used to test for overall significance of the model. For the present example, the output displays a value of 4.471 on 5 df for the deviance, and a value of 2134 91.1 Logistic Regression – Example: Logistic Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 23.652 on 3 df for the likelihood ratio statistic, with a p-value < 0.05, thereby rejecting the null hypothesis that all the parameters of the model are 0. The last section Parameter Estimates, displays the Model Term, Point Estimate, and the Confidence Interval and p-value for Beta. The Model Terms show there are two covariates, CD4, CD8 in the model. The next three columns (under Point Estimates) show MLE as Type, estimates and standard error of Beta’s. For CD4, the estimate of Beta is −2.542. For CD8, the estimate of Beta is 1.659. The next four columns show the inference type, confidence interval of Beta, and the p-value (2*1-sided) for testing Beta = 0. Here the p-value for CD4 is 0.002. Analysis Steps: Logistic Estimate in Odds Ratio Here we would switch from displaying the regression parameters on the log scale (the default) to displaying them on the odds ratio scale. 1. In the Options tab, select Odds Ratio/ Risk Ratio in the Output Parameter. 2. Click OK to re-run the estimation, the parameter estimates are all transformed 91.1 Logistic Regression – Example: Logistic Regression 2135 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis by exponentiation into odds ratios. 3. Now return to the default display by choosing Beta as the Output parameter in the Options tab. Analysis Steps: Regression based on Factor Variables In the LogisticData.cydx data set, CD4 and CD8 assume the values 0, 1, 2. However, these are not the actual blood serum levels. Rather they are coded surrogates for them. Thus suppose you are unwilling to treat CD4 and CD8 as ordinal variables, but would 2136 91.1 Logistic Regression – Example: Logistic Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 like to treat them as factors. This requires that CD4 and CD8 each be split up into two dummy variables. The Toggle Factor option in the Logistic Regression dialog box accomplishes this splitting. 1. To accomplish this, press the Shift key on the keyboard, select CD4 and CD8 and then click on the Toggle Factor On/Off button. Notice that the Model Terms section of the window shows < fa > next to both CD4 and CD8. This means that CD4 has been split into two dummy variables, CD4 0 and CD4 1. The CD4 0 variable assumes the value 1 when CD4 is 0, and assumes the value 0 otherwise. The CD4 1 variable assumes the value 1 when CD4 is 1 and 0 otherwise. CD8 has been similarly split. 2. Click OK to obtain the unconditional maximum likelihood estimates of the regression coefficients for the model HIV=CD4+CD8 with CD4 and CD8 declared as factor variables. 91.1 Logistic Regression – Example: Logistic Regression 2137 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis Because the maximum likelihood estimates do not exist for this small data set, convergence is not possible in this case. The Output window only contains question marks for all the model terms. 2138 91.1 Logistic Regression – Example: Logistic Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This is not a problem in East alone. You will face the same difficulty with any other logistic regression software: SAS, BMDP, GLIM or Egret. The question is ”Is there any other way to assess the significance of CD4 and CD8 when they are factor variables?” Interested users are referred to the LogXact software by Cytel Inc. Analysis Steps: Test Multiple Hypothesis Regression Suppose you are interested in a simultaneous test that the parameters corresponding to both CD4 and CD8 in the previously specified model are equal to 0. 1. Click the Input Parameters tab from the status bar below. Select variables CD4 and CD8 and click Toggle Factor On/Off button. 2. In the bottom left corner of the Input dialog, click the Test option. Select CD4 and CD8 in the Model Terms box and click the Toggle Model Terms Selected for Testing Yes/No button. 91.1 Logistic Regression – Example: Logistic Regression 2139 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis 3. Click OK to start the analysis. East displays the following Output. 2140 91.1 Logistic Regression – Example: Logistic Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The title Hypothesis Testing Tests < CD4 = CD8 = 0 > appears near the bottom of the Test results output. Below that, you can see the results of three tests: Scores, Likelihood ratio and Wald of the null hypothesis that the regression parameters corresponding to CD4 and CD8 are both 0. Since two parameters are being tested, this is a 2 degree of freedom test. All three tests are two-sided. Notice that the p-values based for all the tests are very small indicating that we reject the null hypothesis that the parameters corresponding to both CD4 and CD8 are equal to 0. 91.2 Receiver Operating Characteristic (ROC) Curve Example:ROC curve ROC Curve vs Classification Table Example: Classification Table As part of post-fit diagnostics, you can obtain the computed results that are required for producing an ROC curve. Before we discuss ROC curve in detail a few of the technical terms like sensitivity and specificity need to be explained. Terms Explained Consider the example of a medical test carried out on a person to determine whether the person is suffering from HIV disease. Based on the test result, can we compute the probability that the person has the disease. The following table shows the possible alternatives that can occur. Test Positive Test Negative Event (Disease Present) Correct Event Prediction (a) Incorrect Event Prediction (c) Non-Event (Disease Absent) Incorrect Non-Event Prediction (b) Correct Non-Event Prediction (d) Suppose a, b, c and d denote the number of persons for whom the test results were as shown in the above table. Then we can define Sensitivity and Specificity of the test as given below. The Sensitivity of a test is defined as the proportion of Correct Event predictions in the population having the event. a Sensitivity= a+c The Specificity of a test is defined as the proportion of Correct Non-Event predictions in the population having the non-event. d Specificity = b+d In other words, Sensitivity is a measure of True Positive and Specificity is a measure of True Negative of the test. The measure False Positive is given by 1-Specificity or 1- True Negative of the test. 91.2 ROC Curve 2141 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis When is a test positive? Deciding on whether a test is positive or not may involve obtaining the value of a single prognostic variable and checking whether the value is more than or less than a pre-defined cut point. Or the test may involve several prognostic variables. A statistical model like Binary Logistic Regression may be used in such a situation, to estimate the probability of the disease. If the estimated probability is more than a pre-defined cut point, the test may be taken to be positive for the presence of the disease. For each such cut point of the probability, the sensitivity and specificity will vary. An ROC curve is a graphical representation of the tradeoff between False Positive and True Positive for various values of the cut point probabilty. Example:ROC curve Dataset: LogisticData.cydx as described in Section 91.1. Purpose of the Analysis: To fit logistic regression model and produce an ROC curve. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Regression > (Parallel Design) Logistic Regression 3. This will display several input fields associated with Logistic Regression in the main window. In the Main tab, select HIV as the Response variable with Response Value 1, and Freq as the Weightage variable. Select Model Terms to specify an appropriate model. To begin with, model the HIV response rate as a function of CD4 and CD8, both covariates being regarded as ordinal. In the Variables box, select CD4 and CD8 as the Model Terms. The variables CD4 and CD8 will appear in the Model Terms box. Leave the default option as 2142 91.2 ROC Curve – Example:ROC curve <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Estimate. In the Output dialog, select the Postfit Results checkbox. 4. In the Options tab, click the Postfit Results tab. Select the ROC Curve 91.2 ROC Curve – Example:ROC curve 2143 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis checkbox. 5. Click OK to start the analysis. The following output is displayed in the main 2144 91.2 ROC Curve – Example:ROC curve <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 window. 91.2 ROC Curve – Example:ROC curve 2145 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis 6. The postfit output in the output sheet titled ”Regression Diagnostics1” is as follows: 7. Notice the column titled ProbResp containing the Estimated Response probabilities computed from the fitted model. These probability values are used as the cut points for carrying out the computations that are in the ROC-Curve 2146 91.2 ROC Curve – Example:ROC curve <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 worksheet, which is shown below. Examine how the computations for ROC-Curve are carried out. First take the cut point probability as 0.01, the first value in the column ProbResp from Regression Diagnostics results. The rule in using this cut point is that for any individual in the data set, compute the expected probability for response and if this probability is ≥ 0.01, allot that individual as Response or Event. These expected or predicted probabilities are already computed and are shown in the column titled ProbResp. We can tabulate the prediction results for this rule as shown below. Cut Point: z= 0.01 Rule: An individual is ‘Response’ or ‘Event’ if ProbResp is ≥ z. Since for all the groups, ProbResp ≥ 0.01, all the individuals in all the groups are predicted as ‘Response’. 91.2 ROC Curve – Example:ROC curve 2147 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis GrpSize 2 13 7 3 12 7 2 1 Observed Resp Non-Resp 0 2 0 13 2 5 1 2 4 8 4 3 2 0 1 0 Model ProbResp 0.010251 0.051587 0.11625 0.222194 0.408579 0.625565 0.783934 0.97876 Predicted Resp Non-Resp 2 0 13 0 7 0 3 0 12 0 7 0 2 0 1 0 By comparing the Predicted figures and Observed figures, we can tabulate ‘Predicted Correct’ numbers as shown below. Grp Size 2 13 7 3 12 7 2 1 Observed Resp Non-Resp 0 2 0 13 2 5 1 2 4 8 4 3 2 0 1 0 Model ProbResp 0.010251 0.051587 0.11625 0.222194 0.408579 0.625565 0.783934 0.97876 Predicted Resp Non-Resp 2 0 13 0 7 0 3 0 12 0 7 0 2 0 1 0 Predicted Correct Resp Non-Resp 0 0 0 0 2 0 1 0 4 0 4 0 2 0 1 0 By subtracting ‘Predicted Correct’ numbers from ‘Predicted’ numbers, ’predicted Incorrect’ numbers can be obtained as shown below. 2148 91.2 ROC Curve – Example:ROC curve <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Grp Size 2 13 7 3 12 7 2 1 Total Observed Resp Non-Resp 0 2 0 13 2 5 1 2 4 8 4 3 2 0 1 0 Model ProbResp 0.010251 0.051587 0.11625 0.222194 0.408579 0.625565 0.783934 0.97876 Predicted Resp Non-Resp 2 0 13 0 7 0 3 0 12 0 7 0 2 0 1 0 Predicted Correct Resp Non-Resp 0 0 0 0 2 0 1 0 4 0 4 0 2 0 1 0 14 0 Predicted Incorrect Resp Non-Resp 2 0 13 0 5 0 2 0 8 0 3 0 0 0 0 0 33 0 The figures in the last line ‘Total’, 14, 0, 33 and 0 are what you saw in the first row of ROC table. Test Positive Test Negative Event (Disease Present) Correct Event Prediction (a=14) Incorrect Event Prediction (c=0) Non-Event (Disease Absent) Incorrect Non-Event Prediction (b=33) Correct Non-Event Prediction (d=0) a 14 Sensitivity= a+c = 14+0 =1 d 0 Specificity = b+d = 33+0 = 0 Hence, 1 - Specificity = 1 − 0 = 1 The above values of Sensitivity and (1-Specificity), 1 and 1 are what you see in the first row of ROC table. If you carry out similar computations for the fifth group with the cut point of z = 0.408579 you will get the following results. 91.2 ROC Curve – Example:ROC curve 2149 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis Grp Size 2 13 7 3 12 7 2 1 Total Observed Resp Non-Resp 0 2 0 13 2 5 1 2 4 8 4 3 2 0 1 0 Model ProbResp 0.010251 0.051587 0.11625 0.222194 0.408579 0.625565 0.783934 0.97876 Predicted Resp Non-Resp 0 2 0 13 0 7 0 3 12 0 7 0 2 0 1 0 Predicted Correct Resp Non-Resp 0 2 0 13 0 5 0 2 4 0 4 0 2 0 1 0 11 22 Predicted Incorrect Resp Non-Resp 0 0 0 0 0 2 0 1 8 0 3 0 0 0 0 0 11 3 The figures in the last line ‘Total’, 11, 22, 11, and 3 are what you see in the fifth row of the ROC table. Test Positive Test Negative Event (Disease Present) Correct Event Prediction (a=11) Incorrect Event Prediction (c=3) Non-Event (Disease Absent) Incorrect Non-Event Prediction (b=11) Correct Non-Event Prediction (d=22) a 11 Sensitivity= a+c = 11+3 = 0.785714 d 22 Specificity = b+d = 22+11 = 0.666667 Hence, 1 - Specificity = 1 − 00.666667 = 0.333333 The above values of Sensitivity and (1-Specificity), 0.785714 and 0.333333, respectively, are what you see in the fifth row of the ROC table. You have just seen the computations required to obtain the results shown in the ROC table, for 2 cut points. In a similar way, you can check the computations for the remaining 6 cut points. 2150 91.2 ROC Curve – Example:ROC curve <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ROC Curve vs Classification Table Similar to ROC Curve computations, East also provides Classification Table estimates. Though in both the types of analysis, we get information on Sensitivity and Specificity estimates, they differ in the following way: 1. The classification error estimates computed in ROC Curve for any observation is biased because the model used was fitted with data that included that observation. In Classification Table, this bias is eliminated by again estimating the model parameters after leaving out each observation one at a time and then classifying the observation based on new estimates. These new estimates are actually produced as one-step approximations from the computations carried out for the complete data and no separate models are fitted. The formulas used are listed in Appendix W. 2. Classification Table uses Bayes’ theorem and computes posterior probabilities in classification, using prior probabilities and probabilities of events. Example: Classification Table You can obtain classification table information using the Classification table option. Dataset: LogisticData.cydx as described in Section 91.1. Purpose of the Analysis: To fit logistic regression and obtain classification table. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Regression > (Parallel Design) Logistic Regression This will display several input fields associated with Logistic Regression in the main window. 3. In the Main tab, select the variables as shown below. Make sure to select the 91.2 ROC Curve – Example: Classification Table 2151 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis Classification Table checkbox in the Output dialog 4. In the Options tab, click the Classification Table tab. In the ensuing dialog box, specify the set of values for Prior Probabilities and Probability of Events. You can specify these values as discrete values, by entering each value in the box to post it on the right side box. against ‘Discrete value’ and then clicking If your set of values is a range of equidistant values, then you can specify the starting value (From), the ending value (To) and the step value (Step) and then click . East will compute the individual values in the range and display them on the right side box. You are allowed to specify some values as discrete and some as a range. For this example, in the Prior Probabilities section, enter 0.3 and 0.5 in the Discrete Value and click . In the Probability of Events section, enter 0.7 and click . In the Range, enter 0.8 in the From value, 0.9 button next to the Range in the To value and 0.03 in the Step value. Click 2152 91.2 ROC Curve – Example: Classification Table <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ensuing box. For each combination of the values of ‘prior probability’ and ‘probability of event’, East would produce classification results. 5. Click OK to start the analysis. The output will be displayed in the main window. In the library along with the Analysis: Binary Regression: Logistic Model1 node there is another node named the Classification Table1. Click the node Classification Table1 to see the classification table. 91.2 ROC Curve – Example: Classification Table 2153 <<< Contents 91 91.3 * Index >>> Analysis-Binary Regression Analysis Firth Procedure Example: Firth Procedure Since the MLE is only asymptotically unbiased, various methods have been proposed to reduce the bias. One such approach is due to Firth (1993), which reduces the bias in the MLE by introducing a small bias into the score function. The general idea is to remove the O(n−1 ) term in the expression for the bias of the MLE. This is accomplished by calculating the posterior mode based on Jeffrey’s prior. One advantage of the Firth estimator is that it exists when there is complete separation or quasi-complete separation. Example: Firth Procedure Dataset: esr.cydx Data Description The Firth estimator performs well under separation and near separation and we will illustrate the improvement over the MLE by using a well-known dataset that was originally given by Collett and Jermain (1985) and is also found in Collett (2002). The response variable was erythrocyte sedimentation rate (ESR), which is used as an indicator of infections and certain types of diseases. The lower the ESR value the better, and as so often it happens in medical applications, the continuous response variable was dichotomized with less than 20 assigned a value of zero and at least 20 assigned a value of one. The two predictor variables are Fibrinogen and γ-globulin. The data were obtained in a study performed by the Institute of Medical Research, Kuala Lumpur, Malayasia. Purpose of the Analysis: To determine if a patient’s ESR value is a valuable diagnostic. This is accomplished by trying to determine if there is a relationship between ESR and the two predictors, since the latter are commonly elevated in the presence of inflammatory diseases Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Regression > (Parallel Design) Logistic Regression 3. In the Main tab, select esr as the Response variable with Response value as 1. Select fibrinogn and gam glob as the Model Terms. 2154 91.3 Firth Procedure – Example: Firth Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. In the Options tab the default asymptotic method is Maximum Likelihood Estimate. 5. Click OK to start the analysis. The output is displayed in the main window. 6. Click the Input dialog, specify the same model. In the Options tab, choose 91.3 Firth Procedure – Example: Firth Procedure 2155 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis Penalized MLE for bias correction (Firth’s method) as the Type of MLE. 7. Click OK to start the analysis. The output is displayed in the main window. 2156 91.3 Firth Procedure – Example: Firth Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 We can see that MLE and Firth estimates differ. However, the Confidence Intervals for Penalized MLE’s are shorter than those for MLE’s. 91.4 Profile Likelihood Based Confidence Intervals Example Classical Wald’s confidence intervals are based on the asymptotic normality of the maximum likelihood estimate of a parameter. However, in case of small samples, the properties of the estimator can be very different. A symmetric shape of the likelihood function allows use of Wald’s intervals, while an asymmetric shape may result into inaccurate confidence intervals. A more robust construction of confidence intervals is derived from the asymptotic χ2 distribution of the generalized likelihood ratio test. We have seen in Section 91.3 that Firth’s estimator is recommended whenever there is a problem of separation, and is a better alternative to Exact when the latter is not computationally feasible. The problem of separation also leads to inflated standard error which results into an infinite or large Wald’s confidence intervals. In such situations, the confidence intervals based on profile likelihood method are a way out. Heinze and Schemper (2002) show that the confidence intervals based on profile likelihood are often preferable to Wald’s confidence intervals. Heinze (2006) demonstrated that the confidence intervals based on penalized likelihood equation show excellent behavior in terms of the coverage probability and the higher power. Example Dataset: esr.cydx as described in Section 91.3. Purpose of the Analysis: This example includes the confidence intervals based on profile likelihood method for MLE and PMLE estimates. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Regression > (Parallel Design) Logistic Regression 3. In the Main tab, select esr as the Response variable. Choose a value of 1 as the 91.4 Profile Likelihood Based Confidence Intervals – Example 2157 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis Response value. Select fibrinogn and gam glob as the Model Terms. 4. In the Options tab, select the Profile Likelihood and Display Covariance Matrix check boxes. 2158 91.4 Profile Likelihood Based Confidence Intervals – Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. Click OK to start the analysis. The output is displayed in the main window. Note that for every estimate now there are two confidence intervals. The later one is based on Profile likelihood. You can as well have the profile based Confidence Intervals when the Penalized MLE option is chosen. 91.4 Profile Likelihood Based Confidence Intervals – Example 2159 <<< Contents 91 91.5 * Index >>> Analysis-Binary Regression Analysis Probit Regression 91.5.0 Example The probit model is a generalized linear model that uses the inverse cumulative distribution function (cdf) from the standard normal distribution as a link function. Let yi be a binary response for subject i, i = 1, . . . , n, such that yi = 1 if subject i experiences a ”success” and yi = 0 otherwise. Further, let πi and xi be the probability of a response and a vector of covariates for subject i, i = 1, . . . , n, respectively. A probit model for yi is Φ−1 (πi ) = β0 + βb0 xi , where Φ is the standard normal cdf. Here, as in the case of logistic regression, the link function Φ−1 maps the (0,1) scale for πi onto the scale of the entire real line for the linear predictor β0 + βb0 xi . Also similar to the logistic case, the probit link is symmetric around 0.5 in the sense that Φ−1 (π) = −Φ−1 (1 − π). Thus, the response curve for the probability of a response π is symmetric around 0.5. Example: Probit Regression Dataset: Devtox.cydx. Data Description This data set contains 1,512 observations of which you can only see the first few. Use the horizontal and vertical scroll bars or the ↓ and Pg Dn keys to examine the data set. There are 8 variables, ID, Dose, Death, Weight, Malf, Sex, Impl and LittSz, and 1,512 cases (1,512 implantations in 112 litters). The explanation of each variable represents and their codes are described below: 2160 Variable Dose Description dose administered in g/kg body weight Code 0, 0.5, 1 or 2 Death fetal death 1=Yes, 0=No Weight fetal weight in grams Malf fetal malformation 1=Yes, 0=No Sex gender of the rat 1=Male, 2=Female Impl number of implantations in the litter LittSz number of live offspring in the litter 91.5 Probit Regression – 91.5.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Purpose of the Analysis: We will analyze a single binary outcome, death, in a developmental toxicity study of a substance conducted in rats, through a probit model. We want to fit a probit model using the model terms, Dose, Impl and their interaction Dose*Impl. To specify the probit model: Death = Dose+Impl+Dose*Impl to the data. Analysis Steps: Probit Regression - Estimate Model 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Regression > (Parallel Design) Probit Regression 3. In the Main tab, select Death as the Response variable with Response value as 1. Select Dose and Impl as the Model Terms. Add an interaction term Dose*Impl: click on Dose in the Variables section, Press the Ctrl key on the 91.5 Probit Regression – 91.5.0 Example 2161 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis keyboard, click on Impl in the Variables section, and click the a*b . button. 2162 91.5 Probit Regression – 91.5.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK to start the analysis. The output is displayed in the main window. The third section is Summary Statistics. This section displays the deviance and its degrees of freedom, and the likelihood ratio statistic and degrees of freedom for testing the null hypothesis that the response probability of each observation is 0.5, i.e., all the model parameters, including the constant term, are simultaneously 0. The likelihood ratio statistic may be used to test for overall significance of the model. For the present example, the output displays a value of 890.7457 on 30 df for the deviance, and a value of 1205.3314 on 4 df for the likelihood ratio statistic, thereby rejecting the null 91.5 Probit Regression – 91.5.0 Example 2163 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis hypothesis that all the parameters of the model are 0. The last section, Parameter Estimates, displays the Model Term, Point Estimate and the Confidence Interval and p-value for Beta. The Model Terms show there are three covariates, Dose, Impl and Dose*Impl, in the model. The next three columns (the Point Estimates) show MLE as Type, estimates and standard error of Beta’s. For Dose, the estimate of Beta is 1.08. For Impl, the estimate of Beta is 0.07. For Dose*Impl, the estimate of Beta is −0.044. The next four columns show the inference type, confidence interval of Beta, and the p-value (2*1-sided) for testing Beta = 0. Here the p-value for Dose is 0.014. Analysis Steps: Probit Regression - Test Multiple Hypothesis Model Suppose you are interested in a simultaneous test that the parameters corresponding to both Impl and Dose*Impl in the previously specified model are equal to 0. 1. Invoke the Analysis Input tab from the status bar below. In the Input dialog, click the Test option. Use the Toggle Selected for Estimation or Testing button in the Model Terms box to select Impl and Dose*Impl for testing and deselect Dose. 2164 91.5 Probit Regression – 91.5.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. Click OK to start the analysis. The output is displayed in the main window. 91.5 Probit Regression – 91.5.0 Example 2165 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis The title Hypothesis Testing Tests < Impl = Dose ∗ Impl = 0 > appears near the bottom of the Test results worksheet. Below that, you can see the results of three asymptotic tests (based on the Scores, likelihood ratio, and Wald statistics, respectively) of the null hypothesis that the regression parameters corresponding to Impl and Dose*Impl are both 0. Since two parameters are being tested, this is a 2 degree of freedom test. All three tests are two-sided. Notice that the test statistics and p-values based on the Score test (2.241 and 0.326) are very similar to those based on the Likelihood Ratio and Wald tests. The p-values are quite large, indicating that we cannot reject the null hypothesis that the parameters corresponding to both Impl and Dose*Impl are equal to 0. Since we are only interested in a positive trend, it is appropriate to perform 1-sided tests. 1. In the Options tab change the Output p- value to One-sided. 2. Click OK to estimate the model once more. The output is displayed in the main window. 2166 91.5 Probit Regression – 91.5.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that since we specified One-sided p-values, East reports 1-sided p-values as well as corresponding 1-sided confidence bounds. Since the remaining analyses will all be 2-sided, re-set the option for output p-value to Two-sided in the Options tab. Post-Fit Analysis Now that we have fit a model to the data, let us obtain regression diagnostics to evaluate the fit. To do so, invoke Analysis Inputs from the lower status bar. Select the Postfit Results check box. Click OK to run the analysis. In the Library, there will be two more nodes named Regression Diagnostics1 and ROC Curve-1 91.5 Probit Regression – 91.5.0 Example 2167 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis which essentially forms the post-fit analysis. The main output is as follows: 2168 91.5 Probit Regression – 91.5.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The post-fit output in the output sheet titled ‘Regression Diagnostics’ is as follows: 91.5 Probit Regression – 91.5.0 Example 2169 <<< Contents 91 2170 * Index >>> Analysis-Binary Regression Analysis 91.5 Probit Regression – 91.5.0 Example <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ROC Curve and Classification Table The use of ROC Curve and Classification table for Probit model is similar to what is described in sections 91.2 and 91.2 for Logistic Regression model. The ROC output in the output sheet titled ‘ROC Curve-1’ is as follows: 91.6 Complementary Log Log Model The complementary log-log model also falls within the generalized linear model framework. The model uses the complementary log-log function to link the probability 91.6 Complementary Log Log Model 2171 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis of response to a linear combination of the covariates. Using the notation from the previous section, the complementary log-log model is log [−log (1 − πi )] = β0 + βb0 xi . The model gets its name from the fact that the log-log function is applied to (1 − π), or the complement of the probability of a success. Thus, the model applies a log-log link to the probability that Yi = 0. Unlike the logistic and probit models, the complementary log-log model implies that the probability of a response is asymmetric around 0.5. That is, the model specifies that this probability approaches 0 relatively slowly but approaches 1 relatively quickly. See Agresti (2002; Section 6.6.4) for graphical comparison of these rates in relation to the logit and probit models. As a result, the model will fit data that exhibit asymmetric rates of change in the probability of success better than the corresponding logistic and probit models, and is preferable in such cases. Example: Complementary Log Log Model Dataset: Seropos.cydx Data Description Consider the Serological Malaria data that have been discussed by Draper, Voller, and Carpenter (1972), and by Collett (1991). A serologic survey was carried out in 1971 in two areas of Amazonas, Brazil. An indirect fluorescent antibody test was used to detect the presence of antibodies to a malarial parasite in the villagers. The data reproduced in Table below refers to the proportion of individuals in each of seven age groups who were found to be seropositive. Table: Seropositivity rates for villagers in Amozonas, Brazil in 1971 Age group 0-11 months 1-2 years 2-4 years 5-9 years 10-14 years 15-19 years ≥ 20 years Mid-point of age range in years 0.5 1.5 3.0 7.0 12.0 17.0 30.0 Proportion seropositive 3/10 (30.00%) 1/10 (10.00%) 5/29 (17.24%) 39/69 (56.52%) 31/51 (60.78%) 8/15 (53.33%) 91/108 (84.26%) Analysis Steps: Clog Log Regression - Estimate Model 2172 91.6 Example – Example: Complementary Log Log Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Regression > (Parallel Design) Clog Log Regression 3. In the Main tab, select AgeGroup as the variable to be included as the Model Terms. Select Seropositive as the Response variable. Enter 1 as Response Value. Select Frequency as the Weightage variable. Click the Estimate option. 4. Click OK to start the analysis. The output displayed in the main window. 91.6 Example – Example: Complementary Log Log Model 2173 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis The third section is Summary Statistics. This section displays the deviance and its degrees of freedom, and the likelihood ratio statistic and degrees of freedom for testing the null hypothesis that the response probability of each observation is 0.5, i.e., all the model parameters, including the constant term, are simultaneously 0. The likelihood ratio statistic has a chi-squared distribution under the null hypothesis and can be used to test for overall significance of the model. For the present example, the output displays a value of 338.362 on 5 df for the deviance, and a value of 52.927 on 2 df for the likelihood ratio statistic, thereby rejecting the null hypothesis that all the parameters of the model are 0. The last section, Parameter Estimates, displays the Model Term, Point Estimate and the Confidence Interval and p-value for Beta. The Model Term shows one covariate AgeGroup in the model. The next three columns the Point Estimate) show MLE as Type, estimates and standard error of Betas. For AgeGroup, the estimate of Beta is 0.0511. The next four columns shows the inference type, confidence interval of Beta and the p-value(2*1-sided) for testing Beta equal to 0. Here the p-value for AgeGroup is < 0.0001. A node Analysis: Binary Regression: Complementary Log Log Model is created in the Library. Analysis Steps: Clog Log Regression - Test Multiple Hypotheses Suppose you wish to test the null hypothesis that the parameter corresponding to AgeGroup in the model is equal to 0. 1. Click the Analysis Input tab from the status bar below. In the Input dialog, select the Test option. Use the Toggle Model term Selected for Testing button 2174 91.6 Example – Example: Complementary Log Log Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 in the Model Terms box to select AgeGroup for testing. 91.6 Example – Example: Complementary Log Log Model 2175 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis 2. Click OK to start the analysis. The output is displayed in the main window. 2176 91.6 Example – Example: Complementary Log Log Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The title Hypothesis testing Tests < AgeGroup = 0 > appears near the bottom of the Test worksheet. Below that, you can see the results of three tests: Score, likelihood ratio, and Wald respectively of the null hypothesis that the regression parameter corresponding to AgeGroup is 0. Since a single parameter is being tested, this is a 1 degree of freedom test. All three tests are 2-sided. See Agresti (2002). Notice that all three p-values are very small (< 0.0001) indicating that we reject the null hypothesis that the parameter corresponding to AgeGroup is 0. Estimation Results The estimation output currently displays the point estimates, confidence intervals, and two-sided p-values for the parameters corresponding to AgeGroup. These statistics were computed by the maximum likelihood method (See Agresti). Let us look at the individual items computed as estimation output. Specifically, look at the output corresponding to AgeGroup in the Estimate worksheet. The MLE for the β coefficient, its standard error, its confidence interval, and the p-value are all displayed. Post-Fit Analysis Now that we have fit a model to the data, let us obtain regression diagnostics to evaluate the fit. Click the Input Parameters tab from the status bar below. Select the Postfit Results check box. Click OK to start the analysis. The output is displayed in the main window. In the Library along with the node Analysis: Binary Regression:Complementary Log Log Model2, there is two more nodes named Regression Diagnostics1 and ROC Curve-1 which essentially form the post-fit 91.6 Example – Example: Complementary Log Log Model 2177 <<< Contents 91 * Index >>> Analysis-Binary Regression Analysis analysis. The main output is as follows: 2178 91.6 Example – Example: Complementary Log Log Model <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The post-fit output in the output sheet titled ‘Regression Diagnostics’ is as follows: Note the following items of information: 7 records of data, corresponding to the 7 groups; the group size, observed response, and expected response, in each group; the Pearson residual for each group; the Pregibon (1981) ∆β leverage value for each group; the value of the covariate vector for each group. ROC Curve and Classification Table The use of ROC Curve and Classification table for Complementary Log Log model is similar to what is described in sections 91.2 and 91.2 for Logistic Regression model. 91.6 Example 2179 <<< Contents * Index >>> 92 Analysis- Multiple Comparison Procedures for Binary Data It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Type I error rate is inflated when one considers the inferences together as a family. Failure to compensate for multiplicities can have adverse consequences. For example, a drug could be approved when actually it is not better than Placebo. Probability of making at least one type I error is known as family wise error rate (FWER). Multiple comparison (MC) procedures provide a guard against inflation of type I error due to multiple testing. All the MC procedures available in East strongly control FWER. Strong control of FWER refers to preserving the probability of incorrectly claiming at least one null hypothesis. To contrast strong control with weak control of FWER, the latter controls the FWER under the assumption that all hypotheses are true. East supports several p-value based MC procedures for binary data. We have seen how to simulate data under different MC procedures with specified response rates and types of variance such as pooled or unpooled in chapter 27. In this chapter we explain how to analyze binary data with different MC procedures available in East. For MC procedures in East, we can either provide the dataset containing observations under each arm or the raw p-values to obtain the adjusted p-values. 92.1 Available Procedures East supports following MC procedures based on binary endpoint. PROCEDURE Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback REFERENCE Bonferroni CE. (1935) Sidak Z. (1967) Benjamini Y, Hochberg Y. ( 1997) Holm S. (1979) Hochberg Y. (1988) Hommel G. (1988) Westfall PH, Krishen A. (2001) Wiens B. (2003) East supports three p-value based single step MC procedures - Bonferroni procedure, Sidak procedure and Weighted Bonferroni procedure. Whereas, Hocheberg Procedure and Holm procedure are available as Data-driven step-up MC procedures. 2180 92.1 Available Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Fixed-sequence stepwise procedure and fallback procedure are also part of East multiple comparison procedures for binary end points. 92.2 Single step MC procedures Example: Bonferroni procedure Example: Sidak Procedure Example: Weighted Bonferroni Procedure East supports three p-value based single step MC procedures. These are: Bonferroni procedure Sidak procedure and Weighted Bonferroni procedure For the Bonferroni procedure, Hi is rejected if pi < given as min(1, (k − 1)pi ). α k−1 and the adjusted p-value is 1 For the Sidak procedure, Hi is rejected if pi < 1 − (1 − α) k−1 and the adjusted p-value is given as 1 − (1 − pi )k−1 . For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the proportion of α allocated to the Pk−1 1 Hi such that i=1 wi = 1. Note that, if wi = k−1 , then the Bonferroni procedure is reduced to the regular Bonferroni procedure. Example: Bonferroni procedure Dataset: HIV-study.cydx Data Description Throughout this chapter we will use the data from a dose finding HIV Study. It was a randomized, double-blind, parallel-group, placebo-controlled, multi-center trial to assess the efficacy and safety of 125mg(L), 250 mg(M), and 500 mg(H) orally twice daily of a new drug for a treatment of HIV associated diarrhea. The primary efficacy endpoint is clinical response, defined as two or less watery bowel movements per week, during at least two of the four weeks of the 4-week efficacy assessment period. The efficacy is evaluated by comparing the proportion of responders in the placebo group to the proportion of responders in the three treatment groups at a 1-sided alpha of 0.025. The data set consists of two variables. The first variable, Trt group, takes four values as ”P”, ”L”, ”M”, and ”H”. The ”P” value represents the placebo group, ”L” the low dose (125 mg) group, ”M” the middle dose (250 mg) group, and ”H” the high dose (500 mg) group. The second variable, response, is a binary indicator of whether or not each subject was a responder (1 represents a responder, 0 represents a non-responder). 92.2 Single step MC procedures – Example: Bonferroni procedure 2181 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Bonferroni procedure. 2182 92.2 Single step MC procedures – Example: Bonferroni procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Many Samples > (Multiple Comparisons) Parwise Comparison to Control - Differences of Proportions 3. In the Main tab, select the raw data option. In the ensuing box with label Treatment Variable and its Control Arm, select Trt group and select the P option next to it. Select response as the Select Response Variable with a Response Value of 1. Under the dropdown box for selecting the response variable there are two options, pooled variance and un-pooled variance. For this example, select the Pooled Variance option. If Pooled Variance is selected, the software will use the pooled variance estimate in calculating the standard error of the test statistics. If Un-pooled Variance is selected, the software will use the un-pooled variance estimate in calculating the standard error of the test statistics. The technical details on variance estimates are provided in the technical appendix H. Select Bonferroni from the Select MCP drop-down list. 4. In the Advanced tab leave the By Variable input boxes blank. Enter 0.95 for 92.2 Single step MC procedures – Example: Bonferroni procedure 2183 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data Confidence Level and select Right-tail for Rejecion Region. 5. Click OK to start the analysis. The output is displayed in the main window. The Output section gives us a table with our results. The sample size of each group and the sample mean of our response variable (response) are given. The Std. Err. of Diff. of Means column gives us the standard error of the difference of means (not the standard error of the mean) for comparing that specific treatment to placebo. The next column gives us the test statistic. The two columns after that give us the naive and adjusted (using Bonferroni’s procedure) p-values. The technical appendix H contains the technical details on Bonferroni’s procedure. You can refer to it to see how the 2184 92.2 Single step MC procedures – Example: Bonferroni procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 p-values are calculated. From these results we can see that after adjusting for multiplicity there is a significant difference, at the alpha = 0.05 level, in the proportion of clinical response between placebo and the high dose (adjusted p-value = 0.002). We did not find any evidence of a difference between placebo and the low dose (adjusted p-value = 0.647), and placebo group and the middle dose (adjusted p-value = 0.180). Also, the naive p-values are all less than or equal to the adjusted p-values, as expected. The final two columns of the table give us the lower and upper bounds for the 95% one sided confidence intervals. The last section shows us the adjusted global p-value, total number of records, number of records rejected, and total number of arms. In Library, there would also be another node labeled Confidence Interval Plot1. Double click this node to display a Confidence Interval plot. Example: Sidak Procedure Dataset: HIV-study.cydx as described in Section 92.2. 92.2 Single step MC procedures – Example: Sidak Procedure 2185 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Sidak procedure. Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with multiple comparison test in the main window. 2. In the Main tab, select Sidak in the Select MCP drop-down. Leave all other parameters as selected for the Bonferroni procedure above. 3. Click OK to start the analysis. The output, as shown below, is displayed in the 2186 92.2 Single step MC procedures – Example: Sidak Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Preview Area. The interpretation of the above output is similar to what was described for the output of Bonferroni procedure in section 92.2. In addition to the above output, East also creates a confidence interval plot. A node labeled Analysis: Multiple Pairwise Comparisons with Control: Difference of Proportions2 is displayed in the Library. Under this node there is another node 92.2 Single step MC procedures – Example: Weighted Bonferroni Procedure 2187 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data labeled as Confidence Interval Plot2. To open the plot, double click this node. Example: Weighted Bonferroni Procedure Dataset: HIV-study.cydx as described in Section 92.2. Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Weighted Bonferroni procedure. Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with multiple comparison test in the main window. 2. In the Main tab, select Weighted Bonferroni in the Select MCP drop-down. After selecting Weighted Bonferroni a table is displayed below the dropdown box. The table is to specify the proportion of alpha allocated to each comparison. By default East distributes the proportion of alpha equally among the treatment groups. For this example, enter 0.2 for group L, 0.3 for group M, and 0.5 for group H. Ideally, the sum of these values must add up to one. If the sum of these values do not add up to 1, East will automatically scale them to add up to 1. 2188 92.2 Single step MC procedures – Example: Weighted Bonferroni Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Leave all other parameters as selected for the Bonferroni procedure above. 3. Click OK to start the analysis. The output, as shown below, is displayed in the 92.2 Single step MC procedures – Example: Weighted Bonferroni Procedure 2189 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data Output Preview Area. The interpretation of the above output is similar to what was described for the output of Bonferroni procedure in section 92.2. In addition to the above output, East also creates a confidence interval plot. A node labeled Analysis: Multiple Pairwise Comparisons with Control: Difference of Proportions3 is displayed in the Library. Under this node there is another node labeled as Confidence Interval Plot3. To open 2190 92.2 Single step MC procedures – Example: Weighted Bonferroni Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the plot, double click the Confidence Interval Plot node. 92.3 Data-driven stepdown MC procedure 92.3.0 Holm’s Procedure In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures decision of one hypothesis test can influence the decisions on the other tests of hypotheses. There are two types of stepwise procedures. One type of procedures proceed in data-driven order. The other type proceeds in a fixed order set a priori. Stepwise tests in a data-driven order can proceed in step-down or step-up manner. East supports Holm step-down MC procedure which starts with the most significant comparison and continue as long as tests are significant until the test for certain hypothesis fails. The testing procedure stops at the first time a non-significant comparison occurs and all remaining hypotheses will be retained. In i-th step, H(k−i) is rejected if p(k−i) ≤ αi and go to the next step. Example: Holm’s Step Down Procedure Dataset: HIV-study.cydx as described in Section 92.2. Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Holm’s Step Down Procedure. 92.3 Data-driven step-down MC procedure – 92.3.0 Holm’s Procedure 2191 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with multiple comparison test in the main window. 2. In the Main tab, select Holm’s step down in the Select MCP drop-down. Leave all other parameters as selected for the Bonferroni procedure above. 3. Click OK to start the analysis. The output, as shown below, is displayed in the 2192 92.3 Data-driven step-down MC procedure – 92.3.0 Holm’s Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Output Preview Area. The interpretation of the above output is similar to what was described for the output of Bonferroni procedure in section 92.2. In addition to the above output, East also creates a confidence interval plot. A node labeled Analysis: Multiple Pairwise Comparisons with Control: Difference of Proportions4 is displayed in the Library. Under this node there is another node labeled as Confidence Interval Plot4. 92.3 Data-driven step-down MC procedure – 92.3.0 Holm’s Procedure 2193 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data To open the plot, double click the Confidence Interval Plot4 node. 92.4 Data-driven step-up MC procedures 92.4.0 Hochberg’s Procedure 92.4.0 Hommel’s Step up Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures - Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up procedure, in ith step H(k−i) is retained if p(k−i) > α i. In the Hommel step-up procedure, in ith step H(k−i) is retained if p(k−j) > i−j+1 α i for j = 1, · · · , i. Fixed sequence test and fallback test are the types of tests, which proceed, in a prespecified order. Example: Hochberg’s Step Up Procedure Dataset: HIV-study.cydx as described in Section 92.2. Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Hochberg’s Step Up Procedure. 2194 92.4 Data-driven step-up MC procedures – 92.4.0 Hochberg’s Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with multiple comparison test in the main window. 2. In the Main tab, select Hochberg’s step up in the Select MCP drop-down. Leave all other parameters as selected for the Bonferroni procedure above. 3. Click OK to start the analysis. The output, as shown below, is displayed in the Output Preview Area. 92.4 Data-driven step-up MC procedures – 92.4.0 Hochberg’s Procedure 2195 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data The interpretation of the above output is similar to what was described for the output of Bonferroni procedure in section 92.2. In addition to the above output, East also creates a confidence interval plot. A node labeled Analysis: Multiple Pairwise Comparisons with Control: Difference of Proportions5 is displayed in the Library. Under this node there is another node labeled as Confidence Interval Plot5. To open 2196 92.4 Data-driven step-up MC procedures – Example: Hommel’s Step up Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the plot, double click the Confidence Interval Plot5 node. Example: Hommel’s Step up Procedure Dataset: HIV-study.cydx as described in Section 92.2. Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Hochberg’s Step Up Procedure. Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with multiple comparison test in the main window. 2. In the Main tab, select Hommel’s step up in the Select MCP drop-down. Leave 92.4 Data-driven step-up MC procedures – 92.4.0 Hommel’s Step up 2197 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data all other parameters as selected for the Bonferroni procedure above. 3. Click OK to start the analysis. The output, as shown below, is displayed in the Output Preview Area. 2198 92.4 Data-driven step-up MC procedures – 92.4.0 Hommel’s Step up <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The interpretation of the above output is similar to what was described for the output of Bonferroni procedure in section 92.2. In addition to the above output, East also creates a confidence interval plot. A node labeled Analysis: Multiple Pairwise Comparisons with Control: Difference of Proportions6 is displayed in the Library. Under this node there is another node labeled as Confidence Interval Plot6. To open the plot, double click the Confidence Interval Plot6 node. 92.5 Fixed-seq stepwise MC procedures In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both of these procedures. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is pre-specified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen 2001). The drawback of fixed sequence 92.5 Fixed-sequence MC Procedure 2199 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data test is that once a hypothesis is not rejected no further testing is permitted. This will lead to lower power to reject hypotheses tested later in the sequence. Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Example: Fixed Sequence Procedure Dataset: HIV-study.cydx as described in Section 92.2. Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Fixed Sequence Procedure. Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with multiple comparison test in the main window. 2. In the Main tab, select Fixed sequence in the Select MCP drop-down. After selecting Fixed sequence a table will appear below the dropdown box. The table has two columns - Arm and Test Sequence. In the column Test Sequence, you have to specify the order in which the hypotheses will be tested. Specify 1 for the arm that will be compared first with Placebo, 2 for the arm that will be compared next and so on. By default East specifies 1 to the first arm, 2 to the second arm and so on. This default order implies that Dose1 will be compared first with Placebo, then Dose2 will be compared followed by comparison of Dose3 vs. Placebo. However, if we believe that efficacy of drug increases with dose, then the dose groups should be compared in descending order of dose. For this example, assign the high dose a sequential priority of 1, the middle dose as 2, and the low dose as 3. Leave all other parameters as selected for the 2200 92.5 Fixed-sequence MC Procedure – 92.5.0 Fixed Sequence <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Bonferroni procedure above. 3. Click OK to start the analysis. The output, as shown below, is displayed in the 92.5 Fixed-sequence MC Procedure – 92.5.0 Fixed Sequence 2201 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data Output Preview Area. The interpretation of the above output is similar to what was described for the output of Bonferroni procedure in section 92.2. In addition to the above output, East also creates a confidence interval plot. A node labeled Analysis: Multiple Pairwise Comparisons with Control: Difference of Proportions7 is displayed in the Library. Under this node there is another node 2202 92.5 Fixed-sequence MC Procedure – Example: Fallback Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 labeled as Confidence Interval Plot7. To open the plot, double click this node. Example: Fallback Procedure Dataset: HIV-study.cydx as described in Section 92.2. Purpose of the Analysis: To analyze the data of the dose finding HIV trial using Fallback Procedure. Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with multiple comparison test in the main window. 2. In the Main tab, select Fallback in the Select MCP drop-down. After selecting Fallback, a table will appear under the dropdown box. This table is for you to specify the sequential priority for testing and the proportion of alpha allocated to each comparison. See the technical appendix H for details about this procedure. For this example, let’s assign the high dose a sequential priority of 1, the middle dose 2, and the low dose 3. Also, for the proportion of alpha, let’s allocate 0.3 to the low group, 0.3 to the middle group, and 0.4 to the high group. Leave all 92.5 Fixed-sequence MC Procedure – 92.5.0 Fallback Procedure 2203 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data other parameters as selected for the Bonferroni procedure above. 3. Click OK to start the analysis. The output, as shown below, is displayed in the 2204 92.5 Fixed-sequence MC Procedure – 92.5.0 Fallback Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 main window. The interpretation of the above output is similar to what was described for the output of Bonferroni procedure in section 92.2. In addition to the above output, East also creates a confidence interval plot. A node labeled Analysis: Multiple Pairwise Comparisons with Control: Difference of Proportions8 is displayed in the Library. Under this node there is another node labeled as Confidence Interval Plot8. To open 92.5 Fixed-sequence MC Procedure 2205 <<< Contents 92 * Index >>> Analysis- Multiple Comparison Procedures for Binary Data the plot, double click the Confidence Interval Plot node. 2206 92.5 Fixed-sequence MC Procedure <<< Contents * Index >>> 93 Analysis-Comparison of Multiple Comparison Procedures for Continuous Data- Analysis In this chapter, we will use the hypertension trial example to illustrate the different multiple testing procedures. There are two scenarios. One scenario has increasing dose-response profile and the other one has decreasing dose-response profile. The data sets are available in the Samples subfolder in the East installation directory with file names Hypertension-trial.cyd and Hypertension-trial 2.cyd. The trial was conducted to compare the effects of four doses of the new drug. The doses are labeled as D1, D2, D3, and D4 from the lowest dose D1 to the highest dose D4. Table 93.1 and 93.2 display the mean treatment effects of each dose group against placebo group, standard errors, t statistics, raw p-values and 97.5% lower confidence limits for the two scenarios. Table 93.1: Summary Statistics for scenario 1 Dose Mean Effect Standard Error t statistics p-value D1 D2 D3 D4 -0.6957 4.5498 4.9252 6.6268 1.9634 1.9245 1.9634 1.9245 -0.3543 2.3642 2.5085 3.4434 0.638138 0.009838 0.00673 0.000396 97.5% Lower Confidence Limit -4.5831 0.7395 1.0378 2.8164 Table 93.2: Summary Statistics for scenario 2 Dose Mean Effect Standard Error t statistics p-value D1 D2 D3 D4 8.3574 4.979 4.5469 0.9544 1.9817 1.9817 1.9817 1.9817 4.2173 2.5125 2.2944 0.4816 0.000024 0.006631 0.011717 0.315461 97.5% Lower Confidence Limit 4.4354 1.057 0.6249 -2.9676 Table 93.3 displays the adjusted p-values for all the multiplicity adjustment methods. The numbers highlighted in red are significant at 0.025 level. Single step Dunnett test finds two significant doses in both scenario 1 and 2. Using Bonferroni test, only Dose 4 is superior to placebo in Scenario 1 and only Dose 1 is superior to placebo in Scenario 2. Also, note that the adjusted p-values by single step Dunnett test are all smaller than those by Bonferroni test. This is because single step 2207 <<< Contents * Index >>> 93 Analysis-Comparison of Multiple Comparison Procedures for Continuous Data- Analysis Table 93.3: Adjusted p values for scenario 1 MCP procedure Single step Dunnett Step down Dunnett Bonferroni Sidak Holm Hochberg Hommel Fixed sequence (D1,D2,D3,D4) Fallback (D1,D2,D3,D4, equal weights) D1 0.895822 0.638138 1 0.982854 0.638138 0.638138 0.638138 0.638138 1 D2 0.032812 0.018357 0.039351 0.038774 0.02019 0.019676 0.019676 0.638138 0.039351 D3 0.022962 0.018067 0.02692 0.026649 0.02019 0.019676 0.014757 0.638138 0.02692 D4 0.001488 0.001488 0.001584 0.001583 0.001584 0.001584 0.001584 0.638138 0.001584 Dunnett test is a parametric test, which takes into account the joint distribution of the test statistics. Dunnett step down test finds three significant doses in both scenario 1 and 2. It is a closed test based on single step Dunnett procedure and is uniformly more powerful than single step Dunnett test. This can be seen from the fact that all adjusted p-values by Dunnett step down test are smaller than those by single step Dunnett test. The relationship between Dunnett step down test and Holm test is similar to that between single step Dunnett and Bonferroni test. Dunnett step down test is a parametric procedure of Holm test and is uniformly more powerful than Holm test which is confirmed by the smaller p-values adjusted by step down Dunnett test than those adjusted by Holm test. Sidak test gives similar adjusted p-values to those provided by Bonferroni test. These two test have very similar performance. Holm test rejects three doses in both scenarios and all the adjusted p-values by Holm test are smaller than or equal to those by Bonferroni test. This is because Holm test is a closed test based on Bonferroni procedure and consequently it is uniformly more powerful than Bonferroni test. Hochberg and Hommel procedures also reject the same three hypotheses in both scenarios. However, their adjusted p-values for all the doses are smaller than or equal to those by Holm procedure. This is the well-known fact that Hochberg and Hommel procedures are uniformly more powerful than Holm test. Hommel procedure is 2208 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table 93.4: Adjusted p values for scenario 2 MCP procedure Single step Dunnett Step down Dunnett Bonferroni Sidak Holm Hochberg Hommel Fixed sequence (D1,D2,D3,D4) Fallback (D1,D2,D3,D4, equal weights) D1 0.000092 0.000092 0.000094 0.000094 0.000094 0.000094 0.000094 0.000024 0.000094 D2 0.022646 0.017788 0.026523 0.026261 0.019893 0.019893 0.017575 0.006631 0.013262 D3 0.038631 0.02176 0.046866 0.046049 0.023433 0.023433 0.023433 0.011717 0.015622 D4 0.60987 0.315461 1 0.78042 0.315461 0.315461 0.315461 0.315461 0.315461 uniformly more powerful than Hochberg procedure. Their performances are similar which can be seen from the similar adjusted p-values with Hommel adjusted p-values being slightly smaller than the Hochberg ones. Note that Hochberg and Hommel tests control the FWER when the joint distribution of the test statistics have a certain type of positive dependence so called multivariate totally positive of order two (Sarkar and Chang 1997, Sarkar 1998). For negatively correlated test statistics, Hochberg and Hommel procedures might not control the FWER. Fixed sequence test fails to reject all the doses in Scenario 1 where all the adjusted p values are more than 0.5. However, this test rejects dose 1, 2 and 3 in Scenario 2. Further note that the fixed sequence test performs uniformly better than all other procedures since the adjusted p-values are smaller than all those by other procedures. This illustrates an important feature of the fixed sequence test. This test performs best when the testing order is in line with the magnitudes of the underlying true treatment effects. In other words, if the hypotheses being tested earlier in the sequence have larger treatment effects, the fixed sequence procedure is more powerful. On the other hand, if the treatment effects are not monotone with respect to the testing order, this test performs poorly. Fallback procedure rejects dose 4 in Scenario 1 like Bonferroni and Sidak procedures. However, it rejects three doses in Scenario 2, dose 1, 2, and 3. The adjusted p-values generated by fallback test are smaller than those produced by Holm, Hochberg and Hommel tests. This implies that fallback test with equal weights performs better than Holm, Hochberg and Hommel tests when the testing order is in line with the magnitudes of the treatment effects. Also, note that fallback test is more robust than 2209 <<< Contents * Index >>> 93 Analysis-Comparison of Multiple Comparison Procedures for Continuous Data- Analysis fixed sequence test, especially when the testing order is not consistent with the order of the true treatment effects as in Scenario 1 where fallback finds one significance whereas fixed sequence does not find any significant results. 2210 <<< Contents * Index >>> 94 Analysis-Multiple Endpoints for Binary Data In Chapter 28, we have seen how to evaluate different gatekeeping procedures for correlated binary outcome through intensive simulations. In this chapter, we will illustrate how to analyze a trial with binary outcome with gatekeeping multiple comparison procedures. Consider the same example used in Chapter 27: a randomized, placebo-controlled, double blind, parallel treatment clinical trial designed to compare two treatments for migraine. In this study, Telcagepant (300mg), an antagonist of the CGRP receptor associated with migraine, and zolmitriptan (5mg) the standard treatment against migraine, are compared against a placebo. The five co-primary endpoints include pain freedom, pain relief, absence of photophobia (sensitivity to light), absence of phonophobia (sensitivity to sound), and absence of nausea two hours post treatment. Three co-secondary endpoints included more sustained measurements of pain freedom, pain relief, and total migraine freedom for up to a 24 hour period. For illustration purpose, we consider three primary endpoints, pain freedom (PF), absence of phonophobia (phono) and absence of photophobia (photo) at two hours post treatment. Only one endpoint from the secondary family, sustained pain freedom (SPF), will be included in the example. The data set is saved in the installation folder of EAST as Migraine.csv. To analyze this data set, we need to import the data into EAST by clicking on the Import icon as seen in the following screen. Select the Migraine.csv file and click OK to see the data set displayed in EAST. The 2211 <<< Contents 94 * Index >>> Analysis-Multiple Endpoints for Binary Data following screen shows a snapshot of the data set. Now click on the Analysis menu on the top of EAST window, select Two Samples for discrete outcome and then select Multiple Comparisons-Multiple Endpoints from the 2212 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 dropdown list. The main input dialog window pops up as seen in the following screen. EAST can analyze two types of data: (1) raw subject level data, (2) raw p-values. For the migraine example, the data is raw subject level data so we select the left radio button. The left bottom panel of the screen displays all the variables contained in the data set. We need to specify which variable contains the information on treatment group ID for each subject and further specify which one is active treatment group. The next input is to identify all the endpoints to be analyzed. For this example, PF, phono and photo constitute the primary family of endpoints. SPF constitutes the secondary family. Suppose we need to analyze the data using serial gatekeeping procedure. After 2213 <<< Contents 94 * Index >>> Analysis-Multiple Endpoints for Binary Data filling in all inputs, the screen looks as follows Now click on OK button on the right bottom of the screen to run the analysis. The 2214 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 following screen displays the detailed output of this analysis. The first table shows the summary statistics for each endpoint including mean for each treatment group, estimate of treatment effect, standard error of the effect estimate, test statistic and marginal two-sided confidence interval. The second table shows the inference summary including raw p-values, multiplicity adjusted p-values with the gatekeeping procedure and significance status. It also shows whether the primary family is passed as the serial gatekeeper for the secondary family of endpoints. 2215 <<< Contents * Index >>> 95 Analysis-Agreement This chapter discusses Cohen’s Kappa and the Weighted Kappa measures. These two measures are used to assess the level of agreement between two observers classifying a sample of objects on the same categorical scale. The joint ratings of the observers are displayed on a square r × r contingency table. 95.1 Available Measures A reference for each measure of agreement is provided in the table shown below: Measure Of Agreement Cohen’s Kappa Weighted Kappa References Agresti (2002) Liebetrau (1983) Note the following special features of these procedures. For every possible option, in addition to the option specific output, you also get the maximum likelihood point estimate of the measure of agreement (MLE), its asymptotic standard error (ASE MLE), a confidence interval for the measure of agreement , and asymptotic 1 and 2-sided p-values for testing the null hypothesis that Kappa (or weighted Kappa) equals zero. Negative values of Kappa are possible, reflecting agreement weaker than might be expected by chance, but are rare in practice. 95.2 When to Use Each Measure The two measures in this chapter capture the extent to which two sets of observers classifying the same set of objects agree. Cohen’s Kappa: Use Cohen’s Kappa when the classification of each object by the two observers is on a nominal scale. Weighted Kappa: Use the Weighted Kappa when the classification of each object by the two observers is on an ordered scale. 95.3 Example: Cohen’s Kappa Dataset: Radio Case data.cydx Data Description It is hypothetical data concerning two radiologists who rated 85 patients with respect to liver lesions. The ratings were designated on an ordinal scale as ”Normal”, ”Benign”, ”Suspected”, and ”Cancer”. The following table provides the data: 2216 95.3 Example: Cohen’s Kappa <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Rater1/Rater2 Normal Benign Suspected Cancer Normal 21 4 3 0 Benign 12 17 9 0 Suspected 0 1 15 0 Cancer 0 0 2 1 Purpose of the Analysis: To calculate Cohen’s Kappa estimates based on the selected dataset. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Discrete) Agreement > (Parallel Design) Cohen’s Kappa This will display several input fields associated with Cohen’s Kappa test in the main window. 3. In the Main tab, select the variables as shown below: 95.3 Example: Cohen’s Kappa 2217 <<< Contents 95 * Index >>> Analysis-Agreement 4. Click OK to start the analysis. The output is displayed in the main window. East displays estimate of Kappa to be 0.671 which indicates moderate agreement between two radiologist. The asymptotic 1-sided as well as 2-sided p-value is very low. The hypothesis of no agreement is rejected at 5% two sided level of significance. 2218 95.3 Example: Cohen’s Kappa <<< Contents * Index >>> 96 96.1 Superiority Analysis-Survival Data In this section, we explore how we can use East to compare two survival curves. East provides the option of using Log Rank Test for this purpose. Here, our endpoint of interest is time-to-event. Some situations in medical research could be: study of a new-anticancer agent on patient survival; study of an anti-depressant drug on shortening the interval between diagnosis of depression and response to treatment and so on. More formally, we are interested in comparing the hazard rate parameters λt and λc between the treatment and control populations. Define δ = ln (λt /λc ). The null hypothesis H0 : δ = 0 is tested against a 2-sided alternative H1 : δ 6= 0 or against a one-sided alternative H1 : δ < 0 or H1 : δ > 0. where λt (u) = ft (u) 1 − Ft (u) λc (u) = fc (u) 1 − Fc (u) and associated with the survival distributions Ft and Fc , respectively. Then the Logrank test is especially effective for detecting the proportional hazards alternative hypothesis. Under the null hypothesis, log δ = 0. If log δ is positive, population Fc prolongs survival relative to population Ft , while if log δ is negative, population Ft prolongs survival relative to population Fc . 96.2 Example: Survival Superiority Two Samples:Logrank Dataset: Cancer.cydx Data Description This data is from a small lung cancer clinical trial involving a new and control drug. The dataset has three variables Drug, Response and Censored. The variable Drug acts as an identifier of the population to which the observation belongs. The value 1 corresponds to the control group and value 2 corresponds to the treatment group. The Response variable provides survival time (in days). 96.2 Example: Survival Superiority Two Samples:Logrank 2219 <<< Contents 96 * Index >>> Analysis-Survival Data The variable Censored gives information about which observation is censored. The value 0 corresponds to censoring and the value 1 corresponds to non-censoring. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Events) Two Samples > (Parallel Design) Logrank This will display several input fields associated with Logrank Test in the main window. 3. In the Main tab, select Superiority as Trial Type and Drug as Population Id. Enter 1 as Control and 2 as Treatment. Select Response as Response variable and Censored as Censor variable with Censor Value as 0 and Complete as 1. This data does not have a frequency variable, so leave it blank. 4. In the Advanced tab leave the fields By Variable 1 and By Variable 2 blank. Keep the default value 0.95 for Confidence Level. 2220 96.2 Example: Survival Superiority Two Samples:Logrank <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. Click OK to start the analysis. The output is displayed in the main window. East calculates 2-sided as well as 1-sided p-values. 2-sided p-value for this test is 0.005 and 1-sided p-value is 0.002. At 5% significance level, the null hypothesis is rejected. 96.3 Example :Survival Superiority Two Samples: WilcoxonGehan Dataset: Cancer.cydx Data Description 96.3 Example :Survival Superiority Two Samples: Wilcoxon-Gehan 2221 <<< Contents 96 * Index >>> Analysis-Survival Data This data is from a small lung cancer clinical trial involving a new and control drug. The dataset has three variables Drug, Response and Censored. The variable Drug acts as an identifier of the population to which the observation belongs. The value 1 corresponds to the control group and value 2 corresponds to the treatment group. The Response variable provides survival time (in days). The variable Censored gives information about which observation is censored. The value 0 corresponds to censoring and the value 1 corresponds to non-censoring. Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Events) Two Samples > (Parallel Design) Wilcoxon This will display several input fields associated with Wilcoxon-Gehan Test in the main window. 3. In the Main tab, select Superiority as Trial Type and Drug as Population Id. Enter 1 as Control and 2 as Treatment. Select Response as Response variable and Censored as Censor variable with Censor Value as 0 and Complete as 1. This data does not have a frequency variable, so leave it blank. Choose Test Statistic as Wilcoxon-Gehan. 4. In the Advanced tab leave the fields By Variable 1 and By Variable 2 blank. 2222 96.3 Example :Survival Superiority Two Samples: Wilcoxon-Gehan <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Keep the default value 0.95 for Confidence Level. 5. Click OK to start the analysis. The output is displayed in the main window. 96.3 Example :Survival Superiority Two Samples: Wilcoxon-Gehan 2223 <<< Contents 96 * Index >>> Analysis-Survival Data East calculates 2-sided as well as 1-sided p-values. 2-sided p-value for this test is 0.007 and 1-sided p-value is 0.004. At 5% significance level, the null hypothesis is rejected. 96.4 Example:Survial Superiority Two Samples: Harrington-Fleming Dataset: Cancer.cydx Data Description This data is from a small lung cancer clinical trial involving a new and control drug. The dataset has three variables Drug, Response and Censored. The variable Drug acts as an identifier of the population to which the observation belongs. The value 1 corresponds to the control group and value 2 corresponds to the treatment group. The Response variable provides survival time (in days). The variable Censored gives information about which observation is censored. The value 0 corresponds to censoring and the value 1 corresponds to non-censoring. Analysis Steps: 1. Open the dataset from Samples folder. 2224 96.4 Example:Survial Superiority Two Samples: Harrington-Fleming <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. Choose the menu item: Analysis > (Events) Two Samples > (Parallel Design) Harrington-Fleming-Sup This will display several input fields associated with Harrington-Fleming Test in the main window. 3. In the Main tab, select Superiority as Trial Type and Drug as Population Id. Enter 1 as Control and 2 as Treatment. Select Response as Response variable and Censored as Censor variable with Censor Value as 0 and Complete as 1. This data does not have a frequency variable, so leave it blank. Choose Test Statistic as Harrington-Fleming .Leave the default values of p and q as 1 each. 4. In the Advanced tab leave the fields By Variable 1 and By Variable 2 blank. Keep the default value 0.95 for Confidence Level. 5. Click OK to start the analysis. The output is displayed in the main window. 96.4 Example:Survial Superiority Two Samples: Harrington-Fleming 2225 <<< Contents 96 2226 * Index >>> Analysis-Survival Data 96.4 Example:Survial Superiority Two Samples: Harrington-Fleming <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 East calculates 2-sided as well as 1-sided p-values. 2-sided p-value for this test is 0.024 and 1-sided p-value is 0.012. At 5% significance level, the null hypothesis is rejected. 96.5 Example: Survival Noninferiority two Samples:Logrank Dataset: Cancer.cydx as described in section 96.2. Purpose of the Analysis: This section will illustrate through a worked example how to analyze data generated from a two-sample noninferiority study with a time-to-event trial endpoint. The noninferiority margin is generally determined by performing a meta-analysis on past clinical trials of the active control versus placebo. Regulatory agencies then require the sponsor of the clinical trial to demonstrate that a fixed percentage of the active control effect (usually 50%) is retained by the new treatment. In a noninferiority trial the goal is to establish that an experimental treatment is no worse than the standard treatment, rather than attempting to establish that it is superior. The between-treatment difference 96.5 Example: Survival Noninferiority two Samples:Logrank 2227 <<< Contents 96 * Index >>> Analysis-Survival Data is expressed in terms of the hazard ratio, ρ= λt , λc or equivalently, in terms of the log hazard ratio δ = ln(ρ) = ln( λt ). λc Where ρ0 is the noninferiority margin for the hazard ratio, whereas, δ0 = ln(ρ0 ) is the noninferiority margin for log hazard ratio. We perform the comparison of the two treatments by testing H0 : δ ≥ δ0 against the one-sided alternative H1 : δ < δ 0 , when δ0 (≥ 0) Or H0 : δ ≤ δ0 against the one-sided alternative H1 : δ > δ 0 , when δ0 (≤ 0) . 2228 96.5 Example: Survival Noninferiority two Samples:Logrank <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with Logrank Test in the main window. 2. In the Main tab, select Noninferiority as Trial Type. Enter noninferiority margin as ln(0.511692) which is −0.67. Select Drug in the Population Id field with 1 as Control and 2 as Treatment. Select Response as Response variable. Select Censored as Censor variable with Censor value as 0. This data does not have a frequency variable, so leave the Frequency Variable blank. Choose the Test Statistic LogRank 3. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Enter 0.975 as the value of Confidence Level. 96.5 Example: Survival Noninferiority two Samples:Logrank 2229 <<< Contents 96 * Index >>> Analysis-Survival Data 4. Click OK to start the analysis. The output is displayed in the main window. With the low 1-sided p-values the noninferiority of the drug over control is established. 96.6 Example: Survival Noninferiority two Samples-Wilcoxon Dataset: Cancer.cydx Data Description This data is from a small lung cancer clinical trial involving a new and control drug. The dataset has three variables Drug, Response and Censored. 2230 96.6 Example: Survival Noninferiority two Samples-Wilcoxon <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The variable Drug acts as an identifier of the population to which the observation belongs. The value 1 corresponds to the control group and value 2 corresponds to the treatment group. The Response variable provides survival time (in days). The variable Censored gives information about which observation is censored. The value 0 corresponds to censoring and the value 1 corresponds to non-censoring. Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields associated with Logrank Test in the main window. 2. In the Main tab, select Noninferiority as Trial Type. Enter noninferiority margin as ln(0.511692) which is −0.67. Select Drug in the Population Id field with 1 as Control and 2 as Treatment. Select Response as Response variable. Select Censored as Censor variable with Censor value as 0. This data does not have a frequency variable, so leave the Frequency Variable blank. Choose the Test Statistic as Wilcoxon-Gehan. 3. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Enter 0.975 as the value of Confidence Level. 96.6 Example: Survival Noninferiority two Samples-Wilcoxon 2231 <<< Contents 96 * Index >>> Analysis-Survival Data 4. Click OK to start the analysis. The output is displayed in the main window. 2232 96.6 Example: Survival Noninferiority two Samples-Wilcoxon <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 With the low 1-sided p-values the noninferiority of the drug over control is established. 96.7 Example: Survival Noninferiority two Samples:HarringtonFleming Dataset: Cancer.cydx Data Description This data is from a small lung cancer clinical trial involving a new and control drug. The dataset has three variables Drug, Response and Censored. The variable Drug acts as an identifier of the population to which the observation belongs. The value 1 corresponds to the control group and value 2 corresponds to the treatment group. The Response variable provides survival time (in days). The variable Censored gives information about which observation is censored. The value 0 corresponds to censoring and the value 1 corresponds to non-censoring. Analysis Steps: 1. Click Analysis Inputs tab on the status bar below. This will display several input fields in the main window. 2. In the Main tab, select Noninferiority as Trial Type. Enter noninferiority margin as ln(0.511692) which is −0.67. Select Drug in the Population Id field 96.7 Example: Survival Noninferiority two Samples:Harrington-Fleming 2233 <<< Contents 96 * Index >>> Analysis-Survival Data with 1 as Control and 2 as Treatment. Select Response as Response variable. Select Censored as Censor variable with Censor value as 0. This data does not have a frequency variable, so leave the Frequency Variable blank. Choose the Test Statistic as Harrington-Fleming . 3. In the Advanced tab, leave the fields By Variable 1 and By Variable 2 blank. Enter 0.975 as the value of Confidence Level. 2234 96.7 Example: Survival Noninferiority two Samples:Harrington-Fleming <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 4. Click OK to start the analysis. The output is displayed in the main window. 96.7 Example: Survival Noninferiority two Samples:Harrington-Fleming 2235 <<< Contents 96 * Index >>> Analysis-Survival Data With the low 1-sided p-values the noninferiority of the drug over control is established. 96.8 Example: Survival Multi-arm-Kaplan Meier Estimator Dataset: Cancer.cydx as described in section 96.2. Purpose of the Analysis: The Kaplan-Meier estimator also known as the product limit estimator is an estimator for estimating the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. A plot of the Kaplan-Meier estimate of the survival function is a series of horizontal steps of declining magnitude which, when a large enough sample is taken, approaches the true survival function for that population. The value of the survival function between successive distinct sampled observations is assumed to be constant. An important advantage of the Kaplan-Meier estimator is that the method can take into account some types of censored data, particularly right-censoring, which occurs if a patient withdraws from a study, that is, lost from the sample before the final outcome is observed. 2236 96.8 Example: Survival Multi-arm-Kaplan Meier Estimator <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Events) Explore > (Multi-Arm Design) Kaplan Meier This will display several input fields associated with Kaplan Meier Test in the main window. 3. In the Main tab, select Drug as Population ID, Response as Response variable. Select Censored as Censor variable with Censor value as 0. Leave the Frequency Variable field blank. 96.8 Example: Survival Multi-arm-Kaplan Meier Estimator 2237 <<< Contents 96 * Index >>> Analysis-Survival Data 4. Click OK to start the analysis. The output is displayed in the main window. 2238 96.8 Example: Survival Multi-arm-Kaplan Meier Estimator <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 A node Analysis: Time to Event Response:Kaplan-Meier1 is created in the Library. Also a sub-node Kaplan-Meier Plot1 is created in the Library. Click the Kaplan-Meier Plot1 node to open the plot. Note that in this plot, the estimated survivals are plotted for both the drugs on the same time axis, so that comparison of survivals is possible. The Kaplan-Meier Plot indicates that the patients on Drug arm have better survival as compared with those on the control arm. 96.8 Example: Survival Multi-arm-Kaplan Meier Estimator 2239 <<< Contents * Index >>> 97 Analysis-Multiple Comparison Procedures for Survival Data It is often the case that multiple objectives are to be addressed in one single trial. These objectives are formulated into a family of hypotheses. Type I error rate is inflated when one considers the inferences together as a family. Failure to compensate for multiplicities can have adverse consequences. For example, a drug could be approved when actually it is not better than placebo. Multiple comparison (MC) procedures provide a guard against inflation of type I error due to multiple testing. Probability of making at least one type I error is known as family wise error rate (FWER). East supports several parametric and p-value based MC procedures. We have seen how to simulate survival data under different MC procedures in chapter 51. This chapter explains how to analyze survival data with different MC procedures available in East. 97.1 Available Procedures The probability of making at least one type I error is known as family wise error rate (FWER). All the MC procedures available in East strongly control FWER. Strong control of FWER refers to preserving the probability of incorrectly claiming at least one null hypothesis. To contrast strong control with weak control of FWER, the latter controls the FWER under the assumption that all hypotheses are true. The following MC procedures are available for survival endpoints in East. Category P-value Based Procedure Bonferroni Sidak Weighted Bonferroni Holm’s Step Down Hochberg’s Step Up Hommel’s Step Up Fixed Sequence Fallback Reference Bonferroni CE (1935, 1936) Sidak Z (1967) Benjamini Y and Hochberg Y ( 1997) Holm S (1979) Hochberg Y (1988) Hommel G (1988) Westfall PH, Krishen A (2001) Wiens B, Dimitrienko A (2005) East provides three types of test statistics for the analysis of survival data incorporating MC procedures, which include the Logrank, Wilcoxon-Gehan, and the Harrington-Fleming. For illustration purposes, the examples below will only utilize the Logrank test statistic for data analysis. STAMPEDE Trial 2240 97.1 Available Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Throughout this chapter we consider the data derived from the design of the STAMPEDE trial discussed in chapter 51 to illustrate the analysis of survival data under different MC procedures. The STAMPEDE study is an ongoing, open-label, 5-stage, 6-arm randomized controlled trial using multi-arm, multi-stage (MAMS) methodology for men with prostate cancer. Started in 2005, it was the first trial of this design to use multiple arms and stages synchronously. The study population consists of men with high-risk localized or metastatic prostate cancer, who are being treated for the first time with long-term androgen deprivation therapy (ADT) or androgen suppression. The study started with 5 treatment groups: Standard of care (SOC) = ADT SOC + zoledronic acid (IV) SOC + docetaxel (IV) SOC + celecoxib, an orally administered cox-2 inhibitor SOC + zoledronic acid + docetaxel SOC + zoledronic acid + celecoxib We want to control the FWER at 5% level of significance. Dataset: The data to be used for the examples below arise from the STAMPEDE design described in chapter 51. The resulting SubjectData was generated during a 97.1 Available Procedures 2241 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data design simulation that captured subject level data for every simulation run: 2242 97.1 Available Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 97.2 Single step MC procedures East supports three p-value based single step MC procedures: Bonferroni procedure Sidak procedure and Weighted Bonferroni procedure. For the Bonferroni procedure, Hi is rejected if pi < given as min(1, (k − 1)pi ). α k−1 and the adjusted p-value is 1 For the Sidak procedure, Hi is rejected if pi < 1 − (1 − α) k−1 and the adjusted p-value is given as 1 − (1 − pi )k−1 . For the weighted Bonferroni procedure, Hi is rejected if pi < wi α and the adjusted p-value is given as min (1, wpii ). Here wi denotes the proportion of α allocated to the Pk−1 1 Hi such that i=1 wi = 1. Note that, if wi = k−1 , then the Bonferroni procedure is reduced to the regular Bonferroni procedure. Example: Bonferroni procedure Select the SubjectData node under the appropriate Simulation node in the Library. Next, under the Analysis tab in the Events group, select Many Samples - Pairwise Comparisons to Control - Logrank. The following screen is displayed: 97.2 Single step MC procedures 2243 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data Select the following values for the Main tab: Keep the following default values for the Advanced tab: 2244 97.2 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Click OK to analyze the data. The output will be displayed in the main window. 97.2 Single step MC procedures 2245 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data The adjusted p-values for comparison of Dose1, Dose 2 ... up to Dose 5 vs. Placebo are all essentially 1. Therefore, after multiplicity adjustment according to Bonferroni procedure for this design, we can conclude that no additional treatment in addition to the standard of care at the tested dose levels is significantly different from the current standard treatment (ADT only). Example: Sidak procedure Again with the appropriate SubjectData node selected, under the Analysis tab in the Events group, select Many Samples - Pairwise Comparisons to Control - Logrank. Select the following values for the Main tab: Keep the default values for the Advanced tab and click OK to analyze the data. The 2246 97.2 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 output will be displayed in the main window. The adjusted p-values for comparison of Dose1, Dose 2 ... up to Dose 5 vs. Placebo are all essentially 1. Therefore, after multiplicity adjustment according to Sidak procedure for this design, we can conclude that no additional treatment in addition to the standard of care at the tested dose levels is significantly different from the current standard treatment (ADT only). Example: Weighted Bonferroni procedure Dataset: 97.2 Single step MC procedures 2247 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Upon selection of weighted Bonferroni procedure, a table will appear under the drop-down box. The table has two columns - Arm and Proportion of Alpha. In the column Proportion of Alpha, you have to specify the proportion of total alpha you want to spend in each test. Ideally, the values in this column should add up to 1; if not, then East will normalize it to add them up to 1. By default, East distributes the total alpha equally among all tests. Here we have 4 tests in total, therefore each of the tests have proportion of alpha as 1/4 or 0.25. You can specify other proportions as well. For this example, keep the equal proportion of alpha for each test. 5. Click OK to analyze the data. The output will be displayed in the main window 2248 97.2 Single step MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.982, 0.031, 0.044 and 0.001, respectively. Therefore, after multiplicity adjustment according to Sidak procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Notice that the adjusted p-values in weighted Bonferroni MC procedure and the simple Bonferroni procedures are identical. This is because the weighted Bonferroni procedure with equal proportion reduces to the simple Bonferroni procedure. 97.2 Single step MC procedures 2249 <<< Contents 97 97.3 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data Step down MC procedure newline In the single step MC procedures, the decision to reject any hypothesis does not depend on the decision to reject other hypotheses. On the other hand, in the stepwise procedures decision of one hypothesis test can influence the decisions on the other tests of hypotheses. There are two types of stepwise procedures. One type of procedures proceeds in data-driven order. The other type proceeds in a fixed order set a priori. Stepwise tests in a data-driven order can proceed in step-down or step-up manner. East supports Holm step-down MC procedure which start with the most significant comparison and continue as long as tests are significant until the test for certain hypothesis fails. The testing procedure stops at the first time a non-significant comparison occurs and all remaining hypotheses will be retained. In i-th step, H(k−i) is rejected if p(k−i) ≤ αi and go to the next step. Example: Holm’s step-down Dataset: Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Click OK to analyze the data. The output will be displayed in the main window 2250 97.3 Step down MC procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.023, 0.023 and 0.001, respectively. Therefore, after multiplicity adjustment according to Holm’s step-down procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. 97.4 Data-driven step-up MC procedures newline Step-up tests start with the least significant comparison and continue as long as tests are not significant until the first time when a significant comparison occurs and all remaining hypotheses will be rejected. East supports two such MC procedures Hochberg step-up and Hommel step-up procedures. In the Hochberg step-up procedure, in i-th step H(k−i) is retained if p(k−i) > αi . In the Hommel step-up procedure, in i-th step H(k−i) is retained if p(k−j) > i−j+1 α for j = 1, · · · , i. Fixed i sequence test and fallback test are the types of tests which proceed in a prespecified order. 97.4 Data-driven step-up MC procedures 2251 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data Example: Hochberg’s step-up procedure newline Dataset: Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. 4. Click OK to analyze the data. The output will be displayed in the main window 2252 97.4 Data-driven step-up MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.022, 0.022 and 0.001, respectively. Therefore, after multiplicity adjustment according to Hochberg’s step-up procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Example: Hommel’s step-up procedure newline Dataset: Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown 97.4 Data-driven step-up MC procedures 2253 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data below. 4. Click OK to analyze the data. The output will be displayed in the main window 2254 97.4 Data-driven step-up MC procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 once the analysis is over. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.017, 0.022 and 0.001, respectively. Therefore, after multiplicity adjustment according to Hommel’s step-up procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. 97.5 Fixed-sequence stepwise MC procedures In data-driven stepwise procedures, we don’t have any control on the order of the hypotheses to be tested. However, sometimes based on our preference or prior knowledge we might want to fix the order of tests a priori. Fixed sequence test and fallback test are the types of tests which proceed in a pre-specified order. East supports both these procedures. Assume that H1 , H2 , · · · , Hk−1 are ordered hypotheses and the order is pre-specified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , · · · , pk−1 be the associated raw marginal p-values. In the fixed sequence testing procedure, for i = 1, · · · , k − 1, in i-th step, if pi < α, reject Hi and go to the next step; otherwise 97.5 Fixed-sequence stepwise MC procedures – 97.5.0 Stepwise MC Procedures 2255 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data retain Hi , · · · , Hk−1 and stop. Fixed sequence testing strategy is optimal when early tests in the sequence have largest treatment effect and performs poorly when early hypotheses have small treatment effect or are nearly true (Westfall and Krishen (2001)). The drawback of fixed sequence test is that once a hypothesis is not rejected no further testing is permitted. This will lead to lower power to reject hypotheses tested later in the sequence. Fallback test alleviates the above undesirable feature for fixed sequence test. Let wi be Pk−1 the proportion of α for testing Hi such that i=1 wi = 1. In the fixed sequence testing procedure, in i-th step (i = 1, · · · , k − 1), test Hi at αi = αi−1 + αwi if Hi−1 is rejected and at αi = αwi if Hi−1 is retained. If pi < αi , reject Hi ; otherwise retain it. Unlike the fixed sequence testing approach, the fallback procedure can continue testing even if a non-significant outcome is encountered by utilizing the fallback strategy. If a hypothesis in the sequence is retained, the next hypothesis in the sequence is tested at the level that would have been used by the weighted Bonferroni procedure. With w1 = 1 and w2 = · · · = wk−1 = 0, the fallback procedure simplifies to fixed sequence procedure. Example: Fixed sequence testing procedure Dataset: Analysis Steps: 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. Upon selection of Fixed Sequence procedure, a table will appear under the drop-down box. The table has two columns - Arm and Test Sequence. In the column Test Sequence, you have to specify the order in which the hypotheses will be tested. Specify 1 for the arm that will be compared first with Placebo, 2 for the arm that will be compared next and so on. By default East specifies 1 to the first arm, 2 to the second arm and so on. This default order implies that Dose1 will be compared first with Placebo, then Dose2 will be compared followed by comparison of Dose3 vs. Placebo and finally Dose 4 will be compared with Placebo. However, if we believe that efficacy of drug increases with dose, then the dose groups should be compared in descending order of dose. Therefore, specify 4, 3, 2 and 1 in column Test Sequence for D1, 2256 97.5 Fixed-sequence stepwise MC procedures – 97.5.0 Stepwise MC Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 D2, D3 and D4, respectively. This order implies that Dose4 will be compared first with Placebo, then Dose3 will be compared followed by comparison of Dose2 vs. Placebo and finally Dose 1 will be compared with Placebo. Click OK to analyze the data. The output will be displayed in the main window 97.5 Fixed-sequence stepwise MC procedures – 97.5.0 Stepwise MC Procedures 2257 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data once the analysis is over. The input section of the output displays the tests sequence along with the other input values we have provided. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.011, 0.011 and 0.000, respectively. Therefore, after multiplicity adjustment according to fixed sequence procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. Example; Fallback procedure Dataset: Analysis Steps: 2258 97.5 Fixed-sequence stepwise MC procedures – 97.5.0 Stepwise MC Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1. Open the dataset from Samples folder. 2. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 3. In the ensuing dialog box, under the Main tab choose the variables as shown below. Upon selection of Fallback procedure, a table will appear under the drop-down box. The table has three columns - Arm, Proportion of Alpha and Test Sequence. Specify 4, 3, 2 and 1 in column Test Sequence for D1, D2, D3 and D4, respectively. For this example, keep the equal proportion of alpha for each test in the column Proportion of Alpha. 4. Click OK to analyze the data. The output will be displayed in the main window 97.5 Fixed-sequence stepwise MC procedures – 97.5.0 Stepwise MC Procedures 2259 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data once the analysis is over. The input section of the output displays the tests sequence along with the other input values we have provided. The adjusted p-values for comparison of Dose1 vs. Placebo, Dose2 vs. Placebo, Dose3 vs Placebo and Dose4 vs. Placebo are 0.634, 0.022, 0.022 and 0.001, respectively. Therefore, after multiplicity adjustment according to fallback procedure, we can conclude that Dose2, Dose3 and Dose4 are significantly different from Placebo, but Dose1 is not significantly different from Placebo at 5% level of significance. 97.6 2260 Example: Raw p-values as input Suppose we don’t have the dataset containing all the observations, rather we have the raw p-values and we want to adjust these using Bonferroni procedure. Here we will consider the 4 raw p-values returned by East using the example STAMPEDE data in all the above output. These p-values are 0.634, 0.008, 0.011 and 0.000. We will use 97.6 Example: Raw p-values as input <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 these raw p-values to obtain adjusted p-values. In order to do this, first, we need to create a dataset containing these p-values. Dataset: New Dataset to be created. 97.6 Example: Raw p-values as input 2261 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data Analysis Steps: 1. In the Home tab, choose textbf New > Case Data. This will open an empty dataset in the main window. Now right click on the column header and click Create Variable as shown below. 2262 97.6 Example: Raw p-values as input <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2. This will bring up the following Variable Type Setting dialog box. 3. Type in Arm for Name and choose the type of variable as String. 97.6 Example: Raw p-values as input 2263 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data 4. Click OK and this will add a column with name Arm in the dataset. Similarly, create a numeric column with label pvalue. Now, enter the values in the table as follows: 2264 97.6 Example: Raw p-values as input <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. East assigns a default name CaseData1 to this dataset. 6. Choose the menu item: Analysis > (Continuous) Many Samples > (Multiple Comparisons) Pairwise Comparisons to Controls - Difference of Means 7. This will display several input fields associated with multiple comparison test in the main window. In the Main tab, select the radio-button corresponding to raw p-values. In the ensuing two boxes, select Arm as Treatment variable and select pvalue for Select raw p-values. Choose Bonferroni from the drop-down list in Select MCP. 97.6 Example: Raw p-values as input 2265 <<< Contents 97 * Index >>> Analysis-Multiple Comparison Procedures for Survival Data 8. Click OK. The output will be displayed in the main window. The adjusted p-values for D1, D2, D3 and D4 are 1, 0.032, 0.044 and 0.000, respectively. Note that these adjusted p-values are very close to what we have obtained with Bonferroni procedure using the dataset Hypertension-trial.cyd. Ideally, both set of p-values should exactly match. The difference in p-values is only due to rounding error. 2266 97.6 Example: Raw p-values as input <<< Contents * Index >>> Volume 10 Appendices A Introduction to Volume 10 2269 B Group Sequential Design in East 6 C Interim Monitoring in East 6 2271 2313 D Computing the Expected Number of Events 2334 E Generating Survival Simulations in EastSurv 2345 F Spending Functions Derived from Power Boundaries G The Recursive Integration Algorithm H Theory - Multiple Comparison Procedures I Theory - Multiple Endpoint Procedures 2352 2353 2368 J Theory-Multi-arm Multi-stage Group Sequential Design K Theory - MultiArm Two Stage Designs Combining p-values 2394 L Technical Details - Predicted Interval Plots 2347 2404 2374 <<< Contents * Index >>> M Enrollment/Events Prediction - Theory N Dose Escalation - Theory O R Functions 2409 2412 2427 P East 5.x to East 6.4 Import Utility 2478 Q Technical Reference and Formulas: Single Look Designs R Technical Reference and Formulas: Analysis 2542 S Theory - Design - Binomial One-Sample Exact Test T Theory - Design - Binomial Paired-Sample Exact Test U Theory - Design - Simon’s Two-Stage Design X Glossary 2638 2639 Y On validating the East Software Z List of East Beta Testers 2268 2686 2657 2605 2611 2614 V Theory-Design - Binomial Two-Sample Exact Tests W Classification Table 2484 2617 <<< Contents * Index >>> A Introduction to Volume 10 This volume contains all the Appendices for East 6 manual. Appendix B provides the technical details of the design phase. Appendix C deals with the technical explanation of interim monitoring phase. Appendix D deals with the formulas used for the expected number of events in one treatment arm in various situations. The situations we consider vary from simple ones where the hazard rate is constant, the accrual rate is constant, there are no dropouts and each patient is followed until the end of the study, to complex ones where the survival curve is modeled as a piecewise exponential function with K pieces of variable hazard rates, variable accrual rates, constant non-zero dropout rates and where patients are followed for a fixed duration. Appendix E gives the details of the powerful simulation tools available in East for trials with time-to-event endpoints. The simulations may be used to actually design for non-standard problems where power and sample size calculations are analytically intractable. For instance, East allows the user to simulate trials in which the hazard rates for each treatment arm are non-proportional. By trial and error, running simulations under various parameter choices, the user may find an appropriate design for this kind of trial. Appendix F discusses the technical aspects involved in using spending functions boundaries and Wang-Tsiatis or Pampallona-Tsiatis family boundaries in design and monitoring of trials. Appendix G explains the efficiency achieved by employing the Recursive Integration Algorithm in the computations for the various procedures in East. Appendix H lays out the theory behind multiple comparison procedures like Step-up and Step-down Dunnett’s test and other p-value based procedures like Bonferroni, Sidak and some more. Appendix I lays out the theory behind multiple endpoint procedures like Serial Gatekeeping and Parallel Gatekeeping. Appendix S lays out the theory behind East’s power and sample size computations in the case of the exact fixed sample test and the exact group sequential test of a proportion π being equal to a constant π0 . 2269 <<< Contents A * Index >>> Introduction to Volume 10 Appendix T lays out the theory behind East’s power and sample size computations in the case of the exact McNemar’s test for the difference of proportions arising from paired binomial populations. Appendix U lays out the theory behind the two-stage optimal design for phase 2 clinical trials developed by Simon (1989). Appendoix V lays out the theory behind exact power and sample size computations for comparing two independent binomials. Appendix N lays out the theory behind the dose escalation procedures like 3+3, CRM, mTPI and BLRM introduced in East 6.3 Appendix M lays out the theory behind the subject enrollment and event prediction introduced in East 6.3. Appendix Q lays out the theory behind the designs in East and formulas used for calculations. For each test we provide its null hypothesis, test statistic, distribution of the test statistic under null hypothesis. Appendix R lays out the theory and formulas used in East for analyzing data under the Analysis menu. Appendix W lists down the formulas used in computing classification errors. Appendix O discusses the R Integration feature in simulation module which provides the user the opportunity to perform various tasks using R. In this appendix, we list all tasks for which R functions can be used. We will provide syntax and suggested format for various functions. Appendix X provides a glossary of terms and quantities used in East6. Appendix Y describes the extensive validating procedures carried out on all the features incorporated in East 6 and some earlier versions of East. Appendix Z lists down all the beta testers of East who have given their valuable inputs while developing this software. 2270 <<< Contents * Index >>> B Group Sequential Design in East 6 East provides the software support for a repeated significance testing strategy whereby the accumulating data in a phase-III randomized clinical trial are monitored, and the trial is terminated with early rejection of either the null or the alternative hypothesis if a given test statistic crosses a given stopping boundary. This strategy is executed in two phases – the design phase and the interim monitoring phase. Appendix B provides the technical details of the design phase. Appendix C deals with the interim monitoring phase. A thorough coverage of group-sequential methods for clinical trials is offered at an expository level in the textbook by Jennison and Turnbull (2000). This textbook is an excellent complement to the methods discussed in these appendix chapters and implemented in the East software. At the design phase the user specifies the statistical process generating the data, the null and alternative hypotheses being tested, the desired type-I error, the power of the sequential testing procedure, the shape parameters for the spending functions or stopping boundaries, the planned number of interim looks, and the timing of the interim looks. East uses these input parameters to generate the appropriate stopping boundaries and to compute the maximum statistical information that would be needed to achieve the desired operating characteristics of the sequential testing procedure. Depending on the end point of the clinical trial, the maximum statistical information might be expressed in terms of the patient accrual, the number of events such as failures or deaths, or an abstract dimensionless quantity termed Fisher information. We lay the ground work for designing group sequential studies in Section B.1 where we define the test statistic to be monitored and specify its distributional properties. This distribution theory is presented first in terms of a general framework which is then applied to studies with normal, binomial, time to failure and general end points. In Section B.2 we derive the stopping boundaries for various group sequential designs. In Section B.3 we introduce the notion of an inflation factor and show how it can be applied in the General and Information Based designs available in East. In Section B.4 we compute the expected sample size and expected study duration for these group sequential designs. Although the methodology in this appendix has been developed with reference to two-arm clinical trials, it applies with obvious modifications to the one-sample setting as well. For multi-arm trials in which two or more treatment arms are compared to a common control arm, the two-arm approach can still be applied if supplemented by multiple testing procedures such as Bonferroni or Hochberg. More general situations are handled as special cases of the regression problem discussed in Section B.1.4. In effect one unified approach is adopted for all the group sequential procedures in East. 2271 <<< Contents B * Index >>> Group Sequential Design in East 6 However, since the various cases considered utilize different test statistics for interim monitoring we have provided the formula for each test statistic in Appendix Q. B.1 Distribution Theory B.1.1 B.1.2 B.1.3 B.1.4 Normal Data Binomial Data Time to Event Data General Regression Models Consider a two arm randomized clinical trial comparing an experimental treatment with a control treatment. Let the treatment difference of primary interest be denoted by a single scalar parameter δ. The choice of parameter δ will depend on the model generating the patient response. For normal response, δ might represent the difference of means. For binomial response, δ might represent a difference of proportions, a ratio of proportions, an odds ratio, or a log odds ratio. For time-to-event response, δ might represent a difference of medians, a difference of survival rates at a given time-point, a hazard ratio, or a log hazard ratio. More generally, δ might be the coefficient of the treatment effect in a regression model. Suppose we intend to monitor the accumulating data sequentially up to a maximum of K times thereby gathering, in succession, I1 , I2 , . . . IK units of statistical information about δ. In a parametric model I is called the Fisher information. In a semiparametric model, it is called the semiparametric information bound. Since IK represents the maximum information we could obtain, we will also denote it by Imax . It is convenient to define the information fraction tj = Ij /Imax . For trials with normal or binomial response, Ij is proportional to nj , the total sample size attained by the jth monitoring time-point, and tj = nj /nmax . For trials with time-to-event response, Ij is approximately proportional to dj , the total number of events observed by the jth monitoring time-point. In that case tj = dj /dmax . One may regard the information fraction t ∈ [0, 1] as the internal time of the clinical trial. We assume that at each interim monitoring time-point, tj , we can obtain an efficient estimate, δ̂(tj ) for δ, a consistent estimate, var[δ̂(tj )] for the variance of δ̂(tj ), and the sample size (or number of events) is large enough that Ij−1 ≈ var[δ̂(tj )] . Formally an estimate is efficient if it achieves the Cramer-Rao lower bound for parametric models and the information bound as defined by Bickel et. al. (1993) for semiparametric models. In particular maximum likelihood estimates are efficient. Most estimates produced by standard statistical packages like SAS or S-plus for parametric or semiparametric models are efficient. Scharfstein, Tsiatis and Robins (1997) have shown that, under the above conditions, the joint distribution of the Wald statistics δ̂(tj ) − δ0 Z(tj ) = q var[δ̂(tj )] 2272 B.1 Distribution Theory (B.1) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for testing H0 : δ = δ0 , (B.2) computed sequentially at information fractions t1 , t2 , . . . tK , is asymptotically multivariate normal with p E[Z(tj )] = η tj , var[Z(tj )] = 1 , (B.3) (B.4) and for any tj1 < tj2 , s covar[Z(tj1 ), Z(tj2 )] = Ij1 , Ij2 (B.5) where p η = (δ − δ0 ) Imax (B.6) is known as the drift parameter. Usually, δ0 = 0 for superiority trials and δ0 > 0 for non-inferiority trials. An alternative way to express this result is in terms of a process of independent increments. Define p W (tj ) = tj Z(tj ) . (B.7) Then the joint distribution of {W (t1 ), W (t2 ), . . . W (tK )} is asymptotically multivariate normal with E[W (tj )] = ηtj , (B.8) var[W (tj )] = tj (B.9) covar[W (tj1 ), W (tj2 )] = tj1 . (B.10) and From this it follows that, for any tj2 > tj1 , the random variables W (tj1 ) and W (tj2 ) − W (tj1 ) are independent. A parallel result has been obtained by Jennison and Turnbull (1997). This important result has three implications. 1. Most clinical trials, including trials with normal, binomial and survival endpoints, utilize test statistics of the form (B.1). Therefore, by the above theorem, the distributional structure of these test statistics after applying the transformation (B.7), is asymptotically the same as that of the W (tj )’s. Thus one may construct group sequential stopping boundaries for the W (tj ) stochastic process, having the property that under H0 : η = 0 the probability of crossing a boundary is limited to α, the desired type-1 error. These same B.1 Distribution Theory 2273 <<< Contents B * Index >>> Group Sequential Design in East 6 boundaries will then be applicable to the test statistics developed to monitor trials with normal, binomial or survival endpoints, or even more general endpoints like those available through the information based design module of East. Thereby we can construct a common set of boundaries that are applicable to any type of trial provided the test statistics used for monitoring the trial have the same asymptotic distributional structure as the W (tj ) stochastic process. The details of boundary construction are provided in Section B.2. 2. Having generated the appropriate boundaries one may compute boundary crossing probabilities for the stochastic process W (tj ) under alternative hypotheses of the form H1 : η = η1 . One can thereby search for the value of η1 at which the boundary crossing probability equals the desired power, 1 − β. By substituting this value of η into equation (B.6) one can estimate Imax , the maximum information needed to attain the desired power 1 − β, at any pre-specified clinically meaningful treatment difference δ = δ1 . The details of these computations are provided in Section B.2. 3. Because of the independent increments structure of the W (tj )’s it is possible to perform the actual computations that lead to these group sequential stopping boundaries and their crossing probabilities very efficiently by the recursive integration techniques of Armitage, McPherson and Rowe (1969). The distribution theory developed above is applicable to data generated from any arbitrary probability model in which a single scalar parameter δ characterizes the relationship under investigation. In the remainder of Section B.1 we demonstrate that many different statistical models for generating the data provide us with a test statistic whose distributional structure is asymptotically the same as that of the W (tj ) stochastic process. We first consider two-arm randomized clinical trials with normal, binomial and survival endpoints. We then show how the approach may be generalized to any data generating process in which inference is required for a single scalar parameter estimated by an efficient estimator. B.1.1 Normal Data Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a normally distributed outcome variable, X, with means µt and µc , respectively, and with a common variance σ 2 . We intend to monitor the data up to K times after accruing n1 , n2 , . . . nK ≡ nmax patients. The information fraction at the jth look is given by tj = nj /nmax . Let r denote the fraction randomized to treatment T. 2274 B.1 Distribution Theory – B.1.1 Normal Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Efficacy Trials Define the treatment difference to be δ = µt − µc . The null hypothesis of interest is H0 : δ = 0 . We wish to construct a K-look group sequential level-α test of H0 having 1 − β power at the alternative hypothesis H1 : δ = δ1 . Let X̄t (tj ) and X̄c (tj ) be the mean responses of the experimental and control groups, respectively, at time tj . Then δ̂(tj ) = X̄t (tj ) − X̄c (tj ) and var[δ̂(tj )] = σ2 . nj (r)(1 − r) (B.11) (B.12) Therefore, by the Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997) theorem the stochastic process W (tj ) = p X̄t (tj ) − X̄c (tj ) tj q , j = 1, 2, . . . K, 2 σ nj (r)(1−r) (B.13) √ is N (ηtj , tj ) with independent increments, where η = 0 under H0 and η = δ1 Imax under H1 . We refer to η as the drift parameter. Non-Inferiority Trials Define the treatment difference to be δ = µt − µc . Let δ0 be the non-inferiority margin. The null hypothesis of interest is H0 : δ = δ0 . We wish to construct a K-look group sequential level-α test of H0 having 1 − β power at the alternative hypothesis H1 : δ = δ1 . B.1 Distribution Theory – B.1.1 Normal Data 2275 <<< Contents B * Index >>> Group Sequential Design in East 6 Then, by the Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997) theorem, the stochastic process W (tj ) = p X̄t (tj ) − X̄c (tj ) − δ0 q tj , j = 1, 2, . . . K, 2 σ nj (r)(1−r) (B.14) is N (ηtj , tj ) with √ independent increments, where η = 0 under H0 and η = (δ1 − δ0 ) Imax under H1 . We refer to η as the drift parameter. Note that equation (B.12) implies that σ2 = nmax (r)(1 − r) Imax −1 , (B.15) for both the efficacy and non-inferiority trials. We shall show in Section B.2 how to estimate the value of Imax needed in order to achieve a desired amount of power. Equation (B.15) is required for converting maximum information, an abstract dimensionless quantity, into maximum sample size, a physical resource that one usually has to specify at the planning stages of the clinical trial. The equation shows that in order to make the translation from Imax to nmax one must know the value of σ 2 , a nuisance parameter. Test Statistics Used for the Interim Monitoring The test statistics (B.13) and (B.14) both contain σ 2 , a nuisance parameter whose value is typically unknown. Thus we cannot track the path traced by these statistics in the course of a clinical trial, and cannot know for sure if they have crossed a stopping boundary. In practice therefore we replace σ 2 by its estimate s2 (tj ), at each interim monitoring time-point tj , when monitoring a clinical trial with normal endpoints. The modified statistics also have the same large sample behavior and independent increment structure as the W (tj )’s. Therefore the operating characteristics of hypothesis tests and confidence intervals derived by tracking the modified statistics will resemble those that would have been obtained by tracking the original statistics. B.1.2 Binomial Data Consider a randomized clinical trial comparing an experimental treatment, T, to a control treatment, C, on the basis of a binary response variable, X, with response probabilities πt and πc for the experimental and control arms, respectively. We intend to monitor the data up to K times after accruing n1 , n2 , . . . nK ≡ nmax patients. The information fraction at the jth look is given by tj = nj /nmax . Let r denote the fraction randomized to treatment T. 2276 B.1 Distribution Theory – B.1.2 Binomial Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Efficacy Trials Define the treatment difference to be δ = πt − πc The null hypothesis of interest is H0 : δ = 0 . We wish to construct a K-look group sequential level-α test of H0 having 1 − β power at the alternative hypothesis H1 : δ = δ1 . Let π̂t (tj ) and π̂c (tj ) be the maximum likelihood estimates of πt and πc , respectively, at time tj . Then δ̂(tj ) = π̂t (tj ) − π̂c (tj ) (B.16) and var[δ̂(tj )] = πt (1 − πt ) πc (1 − πc ) + . rnj (1 − r)nj (B.17) Therefore, by the Scharfstein, Tsiatis and Robins, Jennison and Turnbull (1997) theorem, the stochastic process W0 (tj ) = p π̂t (tj ) − π̂c (tj ) tj q (B.18) πc (1−πc ) nj (r)(1−r) is N (0, tj ) with independent increments, under H0 . Under H1 , the stochastic process W1 (tj ) = p tj q π̂t (tj ) − π̂c (tj ) (πc +δ1 )(1−πc −δ1 ) rnj + (B.19) πc (1−πc ) (1−r)nj √ is N (ηtj , tj ) with independent increments, where η = δ1 Imax is known as the drift parameter. Note that equation (B.17) and H1 together imply that Imax (π c + δ1 )(1 − πc − δ1 ) πc (1 − πc ) = + rnmax (1 − r)nmax −1 . (B.20) We shall show in Section B.2 how to estimate the value of Imax needed in order to achieve a desired amount of power. Equation (B.20) is required for converting maximum information, an abstract dimensionless quantity, into maximum sample size, B.1 Distribution Theory – B.1.2 Binomial Data 2277 <<< Contents B * Index >>> Group Sequential Design in East 6 a physical resource that one usually has to specify at the planning stages of the clinical trial. The equation shows that in order to make the translation from Imax to nmax one must know the value of πc , a nuisance parameter. Non-Inferiority Trials Define the treatment difference to be δ = πt − πc . Let the non-inferiority margin be δ0 . The null hypothesis of interest is H0 : δ = δ0 . We wish to construct a K-look group sequential level-α test of H0 having 1 − β power at the alternative hypothesis H1 : δ = δ1 . Then, by the Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997) theorem, the stochastic process W0 (tj ) = p tj q π̂t (tj ) − π̂c (tj ) − δ0 (πc −δ0 )(1−πc +δ0 ) rnj + πc (1−πc ) (1−r)nj (B.21) is N (0, tj ) with independent increments under H0 . Under H1 , the stochastic process W1 (tj ) = p tj q π̂t (tj ) − π̂c (tj ) − δ0 (πc −δ1 )(1−πc +δ1 ) rnj + πc (1−πc ) (1−r)nj (B.22) √ is N (ηtj , tj ) with independent increments, where η = (δ1 − δ0 ) Imax is known as the drift parameter. Note that equation (B.17) and H1 together imply that −1 (πc − δ1 )(1 − πc + δ1 ) πc (1 − πc ) Imax = + . rnmax (1 − r)nmax (B.23) We shall show in Section B.2 how to estimate the value of Imax needed in order to achieve a desired amount of power. Equation (B.23) is required for converting maximum information, an abstract dimensionless quantity, into maximum sample size, a physical resource that one usually has to specify at the planning stages of the clinical trial. The equation shows that in order to make the translation from Imax to nmax one must know the value of πc , a nuisance parameter. 2278 B.1 Distribution Theory – B.1.2 Binomial Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Test Statistics Used for the Interim Monitoring The test statistics (B.18), (B.19), (B.21) and (B.22) all contain πc , an unknown nuisance parameter. Therefore, in practice, modified test statistics, whose values can be computed from the interim data, are used to track the progress of the trial and determine if a stopping boundary has been crossed. 1. For superiority trials East provides two options in the choice of test statistic to be used during interim monitoring. The default assumption is that the test statistic W̃s (tj ) = p tj q π̂t (tj ) − π̂c (tj ) π̂t (tj )(1−π̂t (tj )) ntj + π̂c (tj )(1−π̂c (tj )) ncj , (B.24) will be used for the interim monitoring, where ntj and ncj are the sample sizes on the treatment and control arms, respectively, at monitoring time-point tj . Asymptotically, W̃s (tj ) behaves like (B.18) under H0 and behaves like (B.19) under H1 . Thus in either case W̃s (tj ) has the same asymptotic behavior as the N (ηtj , tj ) stochastic process with independent increments. Therefore the operating characteristics of hypothesis tests and confidence intervals derived by tracking W̃s (tj ) will resemble those that would have been obtained by tracking the (B.18) under H0 and tracking (B.19) under H1 . An alternative choice for the test statistic to be used during the interim monitoring phase is W̃0s (tj ) = p π̂t (tj ) − π̂c (tj ) tj q , (B.25) π̂(tj )(1−π̂(tj )) nj (r)(1−r) where π̂(tj ), the pooled estimate of the binomial response probability at time tj , is given by ntj π̂t (tj ) + ncj π̂c (tj ) . (B.26) π̂(tj ) = nj The denominator of (B.25) is an estimate of the standard error of π̂t (tj ) − π̂c (tj ) under the null hypothesis H0 : δ = 0. Therefore W̃0s (tj ) behaves asymptotically like (B.18) under H0 . However, unlike W̃s (tj ), it does not behave like (B.19) under H1 . For this reason, as we shall show in Section B.2.5, the maximum information Imax , required to attain any given power 1 − β, differs by whether the unpooled statistic W̃s (tj ) or the pooled statistic W̃0s (tj ) is used for interim monitoring. 2. For non-inferiority trials we use the test statistic W̃ni (tj ) = p tj q π̂t (tj ) − π̂c (tj ) − δ0 π̂t (tj )(1−π̂t (tj )) ntj B.1 Distribution Theory – B.1.2 Binomial Data + π̂c (tj )(1−π̂c (tj )) ncj , (B.27) 2279 <<< Contents B * Index >>> Group Sequential Design in East 6 where ntj and ncj are the sample sizes on the treatment and control arms, respectively, at monitoring time-point tj . Asymptotically, W̃ni (tj ) behaves like (B.21) under H0 and behaves like (B.22) under H1 . Thus in either case W̃ni (tj ) has the same asymptotic behavior as the N (ηtj , tj ) stochastic process with independent increments. Therefore the operating characteristics of hypothesis tests and confidence intervals derived by tracking W̃ni (tj ) will resemble those that would have been obtained by tracking the (B.21) under H0 and tracking (B.22) under H1 . B.1.3 Time to Event Data Consider a randomized clinical trial comparing two treatments, T and C, on the basis of time to event data. Let the fraction of patients randomized to treatment T be r. We intend to monitor the data up to K times at calendar times l1 , l2 , . . . lK . At calendar time lj let there be qj distinct failures, with corresponding failure times denoted by τ1 (lj ), τ2 (lj ), . . . τqj (lj ) (on the patient follow-up time-scale, not the calendar time-scale). At the ith of these qj failure times let dt (τi (lj )) be the number of failures on treatment T, nt (τi (lj )) be the number of subjects on treatment T at risk of failure, dc (τi (lj )) be the number of failures on treatment C, and nc (τi (lj )) be the number of subjects on treatment C at risk of failure. The data at calendar time lj may thus be represented as qj 2 × 2 contingency tables, where the ith table is of the form Status Failed Not Failed Total Treatment T dt (τi (lj )) nt (τi (lj )) − dt (τi (lj )) nt (τi (lj )) Treatment C dc (τi (lj )) nc (τi (lj )) − dc (τi (lj )) nc (τi (lj )) Total d(τi (lj )) n(τi (lj )) − d(τi (lj )) n(τi (lj )) Efficacy Trials The logrank score statistic S(lj ), at calendar time lj , is obtained by summing the observed minus the expected values in cell (1, 1) of the above collection of qj 2 × 2 tables: qj X nt (τi (lj )) × d(τi (lj )) }. (B.28) S(lj ) = − {dt (τi (lj )) − n(τi (lj )) i=1 If treatments T and C have the same underlying distribution, it is well known (see for example, Mantel, 1966) that the marginal distribution of S(lj ) is asymptotically normal with a mean of zero and with variance equal to the sum of hypergeometric 2280 B.1 Distribution Theory – B.1.3 Time to Event Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 variances across all the tables: var[S(lj )] = qj X nt (τi (lj )) × nc (τi (lj )) × d(τi (lj )) × [n(τi (lj )) − d(τi (lj ))] [n(τi (lj ))]2 [n(τi (lj )) − 1] i=1 . (B.29) The above variance cannot be used for designing a time-to-event trial, however, since it depends on quantities that cannot be estimated a priori. However, the asymptotic distribution of S(lj ) under proportional hazard alternatives was reduced to a simpler form suitable for designing time-to-event trials by Schoenfeld (1981). Specifically, let λt (τ ) and λc (τ ) be the hazard functions for treatment T and treatment C, respectively. Assume that the ratio of hazard functions is constant for all values of τ and define the treatment difference as λt (τ ) δ = ln . λc (τ ) Let the total number of failures observed by calendar time lj be D(lj ) = qj X d(τi (lj )) , i=1 and let r denote the proportion randomized to treatment T. Then, for j = 1, 2, . . . K, S(lj ) is asymptotically normal with E[S(lj )] = δD(lj )r(1 − r) , (B.30) var[S(lj )] = D(lj )r(1 − r) . (B.31) Tsiatis (1981) proved that the sequentially computed logrank score statistics S(l1 ), S(l2 ), . . . S(lK ) have independent increments. That is, and for any j2 > j1 , S(lj1 ) and S(lj2 ) − S(lj1 ) are independent. The independent increments property and the asymptotic normality of S(lj ) makes it possible to design group sequential trials by the same methods as are used to design group sequential trials with normal endpoints, as we now show. We wish to test the null hypothesis H0 : δ = 0 versus the alternative hypothesis H1 : δ = δ1 . B.1 Distribution Theory – B.1.3 Time to Event Data 2281 <<< Contents B * Index >>> Group Sequential Design in East 6 In performing this hypothesis test it is useful to transform the stochastic process S(lj ), j = 1, 2, . . . K, from a process defined on the calendar time lj , to a process defined on the information fraction D(lj ) D(lj ) tj = ≡ . D(lK ) Dmax Thus we define W (tj ) = p S(lj ) r(1 − r)Dmax . (B.32) Now, since the variance of S(lj ) is also the Fisher information for δ at the monitoring time lj (Jennison and Turnbull, 2000, page 78), it follows that the Fisher information at the final monitoring time lK is given by Imax = var[S(lK )] = r(1 − r)Dmax . (B.33) Therefore W √(tj ) ∼ N (ηtj , tj ) with independent increments, where η = 0 under H0 and η = δ1 Imax under H1 . We refer to η as the drift parameter. We shall show in Section B.2 how to estimate the value of Imax needed in order to achieve a desired amount of power. Equation (B.33) establishes the relationship between maximum information, an abstract dimensionless quantity, and the maximum number of events, a physical resource that one usually has to specify at the planning stages of the clinical trial. Notice that Dmax plays the same role in a time-to-event trial that Nmax plays in a normal endpoint trial. As an alternative to computing W (tj ) by equation (B.32) one may compute Z(tj ) = q δ̂(tj ) (B.34) var(δ̂(tj )) where δ̂(tj ) and its standard error are obtained by fitting a Cox proportional hazards √ model to the data. Then tj Z(tj ) has the same asymptotic distribution as W (tj ). Non-Inferiority Trials For non-inferiority trials we again define the treatment difference as δ = ln λt (τ ) . λc (τ ) Now, however, we are interested in testing the null hypothesis H0 : δ = δ0 , 2282 B.1 Distribution Theory – B.1.3 Time to Event Data (B.35) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 against the alternative hypothesis H1 : δ = δ1 . where δ0 is the non-inferiority margin. Accordingly we derive the logrank statistic S(lj ) from the score equations evaluated at δ = δ0 (see, for example, Collett, 1994, page 105) so that S(lj ) = qj X i=1 dt (τi (lj )) − d(τi (lj )) × nt (τi (lj )) exp(−δ) nt (τi (lj )) exp(−δ) + nc (τi (lj )) and the variance of this statistic is qj X d(τi (lj )) × nt (τi (lj )) exp(−δ) × nc (τi (lj )) . var[S(lj )] = (nt (τi (lj )) exp(−δ) + nc (τi (lj )))2 i=1 (B.36) (B.37) By extending Schoenfeld’s (1981) results to this setting we can show that S(lj ) is asymptotically normal with E(S(lj ) = (δ − δ0 )D(lj )r(1 − r) , (B.38) var[S(lj )] = D(lj )r(1 − r) . (B.39) Also it can be shown by application of Martingale results derived from counting processes (L.J.Wei, 2005; personal communication) that the sequentially computed non-central logrank score statistics S(l1 ), S(l2 ), . . . S(lK ) have independent increments. Define S(lj ) . (B.40) W (tj ) = p var[S(lK )] Then, asymptotically, W √ (tj ) ∼ N (ηtj , tj ) with independent increments, where η = 0 under H0 , η = (δ1 − δ0 ) Imax under H1 , and Imax = var[S(lK )] = r(1 − r)Dmax . (B.41) We refer to η as the drift parameter. We shall show in Section B.2 how to estimate the value of Imax needed in order to achieve a desired amount of power. Equation (B.41) is required for converting maximum information, an abstract dimensionless quantity, into the maximum number of events, a physical resource that one usually has to specify at the planning stages of the clinical trial. B.1 Distribution Theory – B.1.3 Time to Event Data 2283 <<< Contents B * Index >>> Group Sequential Design in East 6 As an alternative to computing W (tj ) by equation (B.40) one may compute δ̂(tj ) − δ0 Z(tj ) = q var(δ̂(tj )) (B.42) where δ̂(tj ) and its standard error are obtained by fitting a Cox proportional hazards √ model to the data. Then tj Z(tj ) has the same asymptotic distribution as W (tj ). B.1.4 General Regression Models Consider any general regression model including, for example, the linear regression model, the Cox proportional hazards model, the logistic regression model, and the random effects model for longitudinal data. Let δ be the single scalar coefficient of this model that characterizes the treatment effect of interest. (The case where δ is a vector is not considered in this development.) Let τ1 , τ2 , . . . τK denote K monitoring time-points of calendar time. Let δ̂(τj ) be an efficient estimator of δ, se(δ̂(τj )) be its standard error and δ̂(τj ) − δ0 Z(τj ) = se(δ̂(τj )) be the Wald test statistic, based on all the data available at time τj . Let I(τj ) be the statistical (or Fisher) information about δ available at time τj . The quantity I(τj ) is estimated by [se(δ̂(τj ))]−2 . At any time τj we define the information fraction tj = I(τj ) [se(δ̂(τj ))]−2 ≈ I(τK ) [se(δ̂(τK ))]−2 and compute the test statistic W (tj ) = p tj Z(τj ) . Then, using results derived by Scharfstein, Tsiatis and Robins (1997), Jennison and Turnbull (1997), we can show that W (tj ) ∼ N (ηtj , tj ) , (B.43) p η = (δ − δ0 ) I(τK ) , (B.44) covar{W (tj ), W (tj 0 )} = tj . (B.45) where and for any tj 0 > tj , This general result encompasses all situations in which group sequential inference is desired for a single scalar parameter δ and where an efficient estimator for δ exists. East provides the option to design and monitor studies within this general framework through its information based approach discussed in Chapter 59. 2284 B.1 Distribution Theory – B.1.4 General Regression Models <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 B.2 Stopping Boundaries and Maximum Information B.2.1 Haybittle-Peto Boundaries B.2.2 W-T Power Boundaries B.2.3 P-T Power Boundaries B.2.4 Spending Function Boundary B.2.5 Special Considerations Suppose we plan to monitor the study a maximum of K times at information fractions t1 , t2 , . . . tK . Let the desired type-1 error be α. In this section we show how to compute the following quantities: 1. One- and two-sided boundaries for early stopping to reject the null hypothesis H0 : δ = δ0 ; 2. One- and two-sided boundaries for early stopping to reject either the null hypothesis H0 : δ = δ0 or the alternative hypothesis H1 : δ = δ1 ; 3. One- and two-sided boundaries for early stopping to reject only the alternative hypothesis H1 : δ = δ1 (also known as futility boundaries); 4. Imax , the maximum information needed to achieve a power of 1 − β at the alternative hypothesis H1 : δ = δ1 . All computations will be performed for the process of independent increments W (tj ) ∼ N (ηtj , tj ). We have seen in Section B.1 that a very large class of group sequential tests, including all those available in East, are represented by this stochastic process. Hypotheses about δ, the primary parameter of interest, can be converted into corresponding hypotheses about η by the relationship (B.6). Once stopping boundaries have been obtained for the W (tj ) statistic they can readily be transformed into corresponding stopping boundaries for the Wald statistic Z(tj ) because of the √ relationship W (tj ) = tj Z(tj ) implied by equation (B.1). Boundaries in East are represented primarily in terms of the Wald statistic. Three classes of stopping boundaries are available in East: p-value boundaries – also known as Haybittle-Peto boundaries; power boundaries – also known as Wang-Tsiatis or Pampallona-Tsiatis boundaries; spending function boundaries. Each class is discussed separately below. B.2.1 P-Value or Haybittle-Peto Boundaries P-value or Haybittle-Peto boundaries are available for early rejection of the null hypothesis. As first proposed by Haybittle (1971), these boundaries are derived by pre-specifying a small p-value, p1 say, as the stopping criterion for the first K − 1 interim analyses and then computing a final p-value, p2 say, for declaring statistical significance at the last look in such a way that the overall type-1 error is α. Let zp denote the upper pth quantile of the standard normal distribution; i.e., 1 − Φ(zp ) = p. The trial stops at the first interim look that the p-value is less than or equal to p1 . If this event does not occur, the trial proceeds to the Kth look and statistical significance is declared if the final p-value is less than or equal to p2 . For one-sided tests, the value of B.2 Boundaries and Max Information – B.2.1 Haybittle-Peto Boundaries 2285 <<< Contents B * Index >>> Group Sequential Design in East 6 p2 needed to preserve the type-1 error is obtained by solving the equation 1 − P0 (W (t1 ) < √ t1 zp1 , . . . W (tK−1 ) < p tK−1 zp1 , W (tK ) < zp2 ) = α , (B.46) where P0 (.) denotes probability under the assumption that η = 0. The solution is obtained by numerical search using the recursive integration method of Armitage, McPherson and Rowe (1969) (the AMR algorithm) discussed in Appendix G. Once the value of p2 has been determined, the maximum information is obtained by invoking the AMR algorithm repeatedly and searching for the value of η at which p √ Pη (W (t1 ) < t1 zp1 , . . . W (tK−1 ) < tK−1 zp1 , W (tK ) < zp2 ) = β . (B.47) Denote the solution by η1 . Then, by (B.6) the desired maximum information Imax is Imax = [ η1 2 ] δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. To obtain Haybittle-Peto stopping boundaries for two-sided tests, replace W (tj ) by |W (tj )| throughout equations (B.46) and (B.47). In East 6, we have generalized the Haybittle-Peto stopping boundaries to accommodate unequal p-values at each look. Consider a K-look design where we pre-specify small p-values p1 , . . . , pK−1 as stopping criteria for each of the first K − 1 interim looks at the data. We would now like to compute a final p-value pK for declaring statistical significance in such a way as to preserve the overall type-1 error α. This is achieved by solving the equation 1 − P0 (W (t1 ) < √ t1 zp1 , . . . W (tK−1 ) < p tK−1 zpK−1 , W (tK ) < zpK ) = α , (B.48) Where P0 (.) denotes probability under the assumption that η = 0. The solution is obtained by numerical search using the recursive integration method of Armitage, 2286 B.2 Boundaries and Max Information – B.2.1 Haybittle-Peto Boundaries <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 McPherson and Rowe (1969) (the AMR algorithm) discussed in Appendix G. Once the value of pK has been determined, the maximum information is obtained by invoking the AMR algorithm repeatedly and searching for the value of η at which Pη (W (t1 ) < √ t1 zp1 , . . . W (tK−1 ) < p tK−1 zpK−1 , W (tK ) < zpK ) = β . (B.49) Denote the solution by η1 . Then, by (B.6) the desired maximum information Imax is Imax = [ η1 2 ] δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. B.2.2 Wang-Tsiatis Power Boundaries The power boundaries of Wang and Tsiatis (1987) are available for early rejection of the null hypothesis. These boundaries are of the form c(tj ) = C(∆, α, K)t∆ j (B.50) for j = 1, 2, . . . K, where ∆ is a shape parameter that characterizes the boundary shape and C(∆, α, K) is a positive constant indexed by ∆, α and K. The choice ∆ = 0 yields the O’Brien-Fleming (1979) stopping boundaries while ∆ = 0.5 yields the Pocock stopping boundaries. More generally Wang and Tsiatis (1987) explore this family to find the value of ∆ that minimizes the expected sample size for various design specifications. For one-sided tests, the trial stops at the first interim look that W (tj ) ≥ c(tj ). Therefore, in order to preserve the type-1 error the boundaries must satisfy K \ 1 − P0 { W (tj ) < c(tj )} = α . (B.51) j=1 Since, by equation (B.50), the boundary values c(t1 ), c(t2 ), . . . c(tK ) are completely specified by C(∆, α, K), this constant can be evaluated by numerical search for any B.2 Boundaries and Max Information – B.2.2 W-T Power Boundaries 2287 <<< Contents B * Index >>> Group Sequential Design in East 6 choice of ∆, α and K using the AMR algorithm. Once the boundaries have been determined the maximum information is obtained by again invoking the AMR algorithm, this time to find the value of η that satisfies the type-2 error equation Pη { K \ W (tj ) < c(tj )} = β . (B.52) j=1 Denote the solution by η1 . Then, by (B.6) the desired maximum information Imax is η1 2 Imax = [ ] . δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. To obtain Wang-Tsiatis stopping boundaries for two-sided tests, replace W (tj ) by |W (tj )| throughout equations (B.51) and (B.52). B.2.3 Pampallona-Tsiatis Power Boundaries The power boundaries of Pampallona and Tsiatis (1994) are available for early rejection of either H0 or H1 . It is convenient to discuss the one-sided and two-sided tests separately for these boundaries. One-Sided Tests There are two stopping boundaries for these designs; an “upper” stopping boundary for early rejection of H0 and a “lower” stopping boundary for early rejection of H1 . We reject H0 in favor of H1 the first time we encounter an information fraction tj such that 1 W (tj ) ≥ C1 (∆1 , α, β, K)t∆ (upper boundary) , j and reject H1 in favor of H0 the first time we encounter an information fraction tj such that 2 W (tj ) < ηtj − C2 (∆2 , α, β, K)t∆ (lower boundary) , j where C1 (∆1 , α, β, K) and C2 (∆2 , α, β, K) are positive and indexed by shape parameters, ∆1 and ∆2 , that might take different values. We impose the additional constraint C1 (∆1 , α, β, K) = η − C2 (∆2 , α, β, K) so as to force the boundaries to meet at the last look, thereby ensuring that a decision to reject either of the two hypotheses will indeed be made. The upper and lower stopping boundaries thus form a triangular continuation region. 2288 B.2 Boundaries and Max Information – B.2.3 P-T Power Boundaries <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 We wish these stopping boundaries to have the property that at the null hypothesis, δ = 0, we will cross the upper stopping boundary with probability α, but at the specific alternative hypothesis of interest, say δ = δ1 , we will cross the upper stopping boundary with probability 1 − β and the lower stopping boundary with probability β. The coefficients C1 (∆1 , α, β, K) and C2 (∆2 , α, β, K) are found using a two-dimensional search to simultaneously solve the two equations corresponding to the desired type-1 and type-2 errors of the test: P0 (W (t1 ) ≥ u1 ) + P0 (l1 < W (t1 ) < u1 , W (t2 ) ≥ u2 ) + · · · · · · + P0 (l1 < W (t1 ) < u1 , · · · , lK−1 < W (tK−1 ) < uK−1 , W (tK ) ≥ uK ) = α and Pη (W (t1 ) ≤ l1 ) + Pη (l1 < W (t1 ) < u1 , W (t2 ) ≤ l2 ) + · · · · · · + Pη (l1 < W (t1 ) < u1 , · · · , lK−1 < W (tK−1 ) < uK−1 , W (tK ) ≤ lK ) = β where ∆1 uj = C1 (∆1 , α, β, K)tj and 2 lj = ηtj − C2 (∆2 , α, β, K)t∆ j for j = 1, 2, . . . K. The parameter η is determined simultaneously along with C1 (∆1 , α, β, K) and C2 (∆2 , α, β, K) through the relationship η = C1 (∆1 , α, β, K) + C2 (∆2 , α, β, K) . (B.53) Denote the solution by η1 . Then, by (B.6) the desired maximum information Imax is Imax = [ η1 2 ] δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. Two-Sided Tests The two-sided test is based on a pair of outer boundaries for early rejection of H0 plus an inner wedge for early rejection of H1 . These tests reject H0 in favor of H1+ : δ > 0 if 1 W (tj ) ≥ C1 (∆1 , α, β, K)t∆ j (top outer boundary) , reject H0 in favor of H1− : δ < 0 if 1 W (tj ) ≤ −C1 (∆1 , α, β, K)t∆ j (bottom outer boundary) , B.2 Boundaries and Max Information – B.2.3 P-T Power Boundaries 2289 <<< Contents B * Index >>> Group Sequential Design in East 6 and reject H1 : δ 6= 0 if ∆2 2 C2 (∆2 , α, β, K)t∆ j − ηtj ≤ W (tj ) ≤ ηtj − C2 (∆2 , α, β, K)tj .(inner wedge) The boundaries for these tests jointly form two symmetric triangular continuation regions with outer regions for stopping to reject H0 and an inner wedge for stopping to reject H1 . The boundaries are required to have the property that, under H0 : δ = 0, the overall probability of crossing either of the two outer boundaries is α, while for the specific alternative of interest, δ = δ1 say, the overall probability of crossing either outer boundaries is 1 − β and the probability of entering the inner wedge is β. Again we will impose the constraint C1 (∆1 , α, β, K) = η − C2 (∆2 , α, β, K) so that in the end a decision to reject one of the two hypotheses is reached. Notice that the inner wedge is undefined at information fractions tj such that ∆2 2 C2 (∆2 , α, β, K)t∆ j − ηtj > ηtj − C2 (∆2 , α, β, K)tj . Therefore it will not be possible to stop the trial with rejection of H1 at the jth information fraction unless the trial has progressed sufficiently far so that tj ≥ C2 (∆2 , α, β, K) η 1 1−∆ 2 . (B.54) With this in mind we will find it convenient to set lj = −∞ whenever tj fails to satisfy the condition (B.54). Computing Maximum Information The above computations show that the Wang-Tsiatis and Pampallona-Tsiatis boundaries are generated simultaneously with the drift parameter η needed to achieve 1 − β power. Denote the solution by η1 . Then, by (B.6) the desired maximum information Imax is Imax = [ η1 2 ] δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. B.2.4 Spending Function Boundaries The most general way to generate stopping boundaries is through α and β spending 2290 B.2 Boundaries and Max Information – B.2.4 Spending Function Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 functions where α and β are, respectively, the type-1 and type-2 errors pre-specified for the trial. An α spending function is any monotone function defined on the unit interval with α(0) = 0 and α(1) = α. Similarly a β spending function is any monotone function defined on the unit interval with β(0) = 0 and β(1) = β. The idea of using an α spending function to derive stopping boundaries for early rejection of H0 was first introduced by Lan and DeMets (1983). Subsequently Pampallona, Tsiatis and Kim (1995) (2001) developed the notion of a β spending function to derive stopping boundaries for early rejection of H1 . One may either use the α- and β- spending functions singly, or combine both α- and β-spending in a single trial, with one-sided or two-sided, symmetric or asymmetric boundaries. Below we list and briefly describe all the α- and β-spending functions available in East. We give all the functional forms in terms of α(t) but it is understood that these functional forms also apply to β(t). LD(OF) Lan-DeMets spending function with O’Brien-Fleming flavor. Published by Lan and DeMets (Biometrika, 1983). Functional form: ( zα/2 2 − 2Φ( √ ) for one-sided tests t α(t) = zα/4 4 − 4Φ( √ ) for two-sided tests t This function generates stopping boundaries that closely resemble the O’Brien-Fleming (1979) stopping boundaries. LD(PK) Lan-DeMets spending function with Pocock flavor. Published by Lan and DeMets (Biometrika, 1983). Functional form: α(t) = α ln{1 + (e − 1)t} . This function generates stopping boundaries that closely resemble the Pocock (1977) stopping boundaries. Gm(γ) Gamma spending function. Published by Hwang, Shih and DeCani (Statistics in Medicine, 1990). Functional Form: ( −γt ) α (1−e if γ 6= 0 −γ ) , (1−e α(t) = αt if γ = 0 . Negative values of γ yield convex spending functions that increase in conservatism as γ decreases, while positive values of γ yield concave spending functions that increase in aggressiveness as γ increases. The choice γ = 0 spends the error linearly. The choice γ = −4 produces stopping boundaries that resemble the O’Brien-Fleming boundaries. The choice γ = 1 produces stopping boundaries that resemble the Pocock boundaries. B.2 Boundaries and Max Information – B.2.4 Spending Function Boundary 2291 <<< Contents B * Index >>> Group Sequential Design in East 6 Rho(ρ) Rho spending function. First published by Kim and DeMets (Biometrika, 1987) and generalized by Jennison and Turnbull (2000). Functional form: α(t) = αtρ , ρ > 0 . When ρ = 1, the corresponding stopping boundaries resemble the Pocock stopping boundaries. When ρ = 3, the boundaries resemble the O’Brien-Fleming boundaries. Larger values of ρ yield increasingly conservative boundaries. Power Documented in the East 3 User Manual, Appendix B and C (Cytel Software Corporation, 2000). Obtained by inverting 10-look Wang-Tsiatis (1987) stopping boundaries at ten equally spaced intervals and fitting a smooth curve through the ten points. In the following paragraphs, we provide the technical details for generating a stopping boundary from a spending function. We assume throughout that the study is designed for a total of K looks at the information fractions t1 , t2 , . . . tK . A one sided test is assumed for simplicity. The extension to two-sided tests follows readily by replacing W (tj ) throughout by |W (tj )|. 2292 B.2 Boundaries and Max Information – B.2.4 Spending Function Boundary <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Boundaries and Maximum Information for Early Rejection of H0 Only The boundaries are computed recursively, with c(tj ) being based on the values of c(tl ), l = 1, 2, . . . j − 1. For the first look, at information fraction t1 , find the upper boundary c(t1 ) such that P0 (W (t1 ) ≥ c(t1 )) = α(t1 ) . (B.55) For subsequent looks j = 2, 3, . . . K, having already computed the upper boundaries c(t1 ), c(t2 ), . . . c(tj−1 ), find the upper boundary c(tj ) such that α(tj−1 ) + P0 (W (t1 ) < c(t1 ), . . . W (tj−1 ) < c(tj−1 ), W (tj ) ≥ c(tj )) = α(tj ) . (B.56) These computations are performed by the AMR algorithm. Once the boundaries have been determined the maximum information is obtained by again invoking the AMR algorithm, this time to find the value of η that satisfies the type-2 error equation Pη { K \ W (tj ) < c(tj )} = β . (B.57) j=1 Denote the solution by η1 . Then, by (B.6) the desired maximum information Imax is Imax = [ η1 2 ] δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. To obtain spending function boundaries for symmetric two-sided tests, replace W (tj ) by |W (tj )| throughout equations (B.56) and (B.57). Two-Sided Asymmetric Boundaries for Early Rejection of H0 Only Suppose one wishes to split the total type-1 error, α, of a two-sided test into two components αl and αu , with αl + αu = α in such a way that the probability, under the null hypothesis, of crossing the upper (lower) boundary is αu (αl ). Denote the critical values of the two-sided boundary at interim monitoring time tj by (a(tj ), b(tj )), j = 1, 2, . . . K. These boundary values are obtained by inverting corresponding spending function values (αl (tj ), αu (tj )), j = 1, 2, . . . K, as follows. For the first look, at information fraction t1 , find the lower boundary a(t1 ) such that P0 (W (t1 ) ≤ a(t1 )) = αl (t1 ) , B.2 Boundaries and Max Information – B.2.4 Spending Function Boundary (B.58) 2293 <<< Contents B * Index >>> Group Sequential Design in East 6 and the upper boundary b(t1 ) such that P0 (W (t1 ) ≥ b(t1 )) = αu (t1 ) . (B.59) For subsequent looks j = 2, 3, . . . K, having already computed the boundaries {(a(t1 ), b(t1 )), (a(t2 ), b(t2 )), . . . (a(tj−1 ), b(tj−1 ))} compute (a(tj ), b(tj )) such that αl (tj−1 )+P0 (a(t1 ) < W (t1 ) < b(t1 ), . . . a(tj−1 ) < W (tj−1 ) < b(tj−1 ), W (tj ) ≤ a(tj )) = αl (tj ) (B.60) and αu (tj−1 )+P0 (a(t1 ) < W (t1 ) < b(t1 ), . . . a(tj−1 ) < W (tj−1 ) < b(tj−1 ), W (tj ) ≥ b(tj )) = αu (tj ) . (B.61) We wish to point out that spending functions used to obtain the upper and lower boundaries in the above procedure can belong to different families if desired. Boundaries and Maximum Information for Early Rejection of either H0 or H1 There are two stopping boundaries for these designs; an “upper” stopping boundary for early rejection of H0 and a “lower” stopping boundary for early rejection of H1 . We reject H0 in favor of H1 the first time we encounter an information fraction tj such that a boundary is crossed and it is an upper boundary. We reject H1 in favor of H0 the first time we encounter an information fraction tj such that a boundary is crossed and it is a lower boundary. We impose the constraint that the upper and lower boundaries must meet at tK , thereby ensuring that a decision to reject either of the two hypotheses will indeed be made. The upper and lower stopping boundaries thus form a triangular continuation region. We wish these stopping boundaries to have the property that at the null hypothesis, δ = 0, we will cross the upper stopping boundary with probability α, but at the specific alternative hypothesis of interest, say δ = δ1 , we will cross the upper stopping boundary with probability 1 − β and the lower stopping boundary with probability β. The upper boundaries, uj and the lower boundaries lj , j = 1, 2, . . . K, are found using a two-dimensional search to simultaneously solve two equations corresponding to the desired type-1 and type-2 errors of the test. The drift parameter η is determined simultaneously along with the boundaries. The procedure is specified below: 1. Set the drift parameter η to some arbitrary initial value η = η1 . 2. At the first look, at information fraction t1 , search for the upper boundary u1 such that P0 (W (t1 ) ≥ u1 ) = α(t1 ) , (B.62) and for the lower boundary l1 such that Pη (W (t1 ) ≤ l1 ) = β(t1 ) . 2294 B.2 Boundaries and Max Information – B.2.4 Spending Function Boundary (B.63) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. For subsequent looks j = 2, 3, . . . K − 1, having already computed the pairs of boundaries up to and including (lj−1 , uj−1 ), find the upper boundary uj such that P0 (W (t1 ) ≥ u1 ) + P0 (l1 < W (t1 ) < u1 , W (t2 ) ≥ u2 ) + · · · · · · + P0 (l1 < W (t1 ) < u1 , · · · , lj−1 < W (tj−1 ) < uj−1 , W (tj ) ≥ uj ) = α(tj ) (B.64) and find the lower boundary lj such that Pη (W (t1 ) ≤ l1 ) + Pη (l1 < W (t1 ) < u1 , W (t2 ) ≤ l2 ) + · · · · · · + Pη (l1 < W (t1 ) < u1 , · · · , lj−1 < W (tj−1 ) < uj−1 , W (tj ) ≤ lj ) = β(tj ) . (B.65) 4. At the Kth and final look the upper boundary uK satisfies P0 (W (t1 ) ≥ u1 ) + P0 (l1 < W (t1 ) < u1 , W (t2 ) ≥ u2 ) + · · · · · · + P0 (l1 < W (t1 ) < u1 , · · · , lK−1 < W (tK−1 ) < uK−1 , W (tK ) ≥ uK ) = α . (B.66) Since we want to reach a decision at the last look (either in favor of the null or the alternative) we have to set the lower boundary lK equal to the upper boundary uK . Thus set lK = uK and find the value of β ∗ by calculating β ∗ = Pη (W (t1 ) ≤ l1 ) + Pη (l1 < W (t1 ) < u1 , W (t2 ) ≤ l2 ) + · · · · · · + Pη (l1 < W (t1 ) < u1 , · · · , lK−1 < W (tK−1 ) < uK−1 , W (tK ) ≤ lK ) . (B.67) (a) If β ∗ = β, then the set of boundaries just computed satisfy the required operating characteristics at the alternative η = η1 . (b) If β ∗ > β select a new value ηnew < η1 . Set η1 = ηnew and repeat the steps from Step 2 onward. (c) If β ∗ < β select a new value ηnew > η1 . Set η1 = ηnew and repeat the steps from Step 2 onward The above iterative procedure ends with simultaneous computation of the final stopping boundaries and the final drift parameter η1 . Then, by (B.6) the desired maximum information Imax is Imax = [ η1 2 ] δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. Two-sided boundaries for early rejection of H0 or H1 are obtained by replacing W (tj ) with |W (tj )|. The boundary for early rejection of H0 in the one-sided case is now B.2 Boundaries and Max Information – B.2.4 Spending Function Boundary 2295 <<< Contents B * Index >>> Group Sequential Design in East 6 replaced by two boundaries for early rejection of H0 , symmetrically placed on either side of the X-axis. Similarly the boundary for early rejection of H1 is now replaced by two boundaries for early rejection of H1 , symmetrically placed on either side of the X-axis, and constructed so as to meet their corresponding H0 -rejection boundaries at the final look. This results in two triangular continuation regions and an inner wedge. Making the H0 –H1 Boundaries Non-Binding In the discussion that follows we will refer to the boundary for early rejection of H0 as the “efficacy” boundary and the boundary for early rejection of H1 as the “futility” boundary. Equations (B.62) through (B.67) were used to generate the efficacy and futility boundaries simultaneously. One practical drawback of this simultaneous computation is that the futility boundary cannot be overruled. In other words, if the test statistic crosses the futiltity boundary the trial must be terminated, or else the type-1 error might be inflated. This is so because the interaction between the two boundaries during their construction causes the efficacy boundary to be shifted relative to the position it would have occupied if there were no futility boundary. It could happen, for example, that the presence of the futility boundary “pulls down” the efficacy boundary, making it easier to cross under the null hypothesis, if the futility boundary can be arbitrarily overruled. If the efficacy boundary is disturbed in this manner, the only way to prevent the possibility of inflating the type-1 error is to make the futility boundary strictly binding. This is usually not acceptable to the sponsor of a clinical trial or to the data monitoring committee assigned to the trial. This is the primary motivation for constructing non-binding futility boundaries. We now show how to simultaneouly compute the efficacy and futility boundaries in such a way that the early rejection criteria of the efficacy boundary remain the same as the corresponding criteria in a H0 –only design. In that case there is no danger of inflating the type-1 error even if the futility boundary is overruled. The only cost of this added flexibility is an increase in the maximum information. For ease of exposition we will only describe the one-sided H0 –H1 case 1. Generate the one-sided level-α efficacy boundary as specified by equations (B.55) and (B.56). Denote this boundary by {u1 , u2 , . . . uK }. 2. For this boundary find the value of η that will satisfy the type-2 error equation (B.57). 3. Keeping this value of η and the previously obtained efficacy boundary values {u1 , u2 , . . . uK } fixed, compute the futility boundary {l1 , l2 , . . . lK } as follows: Pη (W (t1 ≤ l1 ) = β(t1 ) 2296 B.2 Boundaries and Max Information – B.2.4 Spending Function Boundary (B.68) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and for j = 2, 3, . . . K − 1, Pη (W (t1 ) ≤ l1 ) + Pη (l1 < W (t1 ) < u1 , W (t2 ) ≤ l2 ) + · · · · · · + Pη (l1 < W (t1 ) < u1 , · · · , lj−1 < W (tj−1 ) < uj−1 , W (tj ) ≤ lj ) = β(tj ) . (B.69) Since the efficacy and futility boundaries are required to meet at look K simply set lK = uK . 4. Compute the power of a K-look design utilizing these boundaries with drift parameter η by evaluating Pη (W (t1 ) ≥ u1 ) + Pη (l1 < W (t1 ) < u1 , W (t2 ) ≥ u2 ) + · · · · · · + Pη (l1 < W (t1 ) < u1 , · · · , lK−1 < W (tK−1 ) < uK−1 , W (tK ) ≥ uK ) (B.70) ∗ βi−1 . Denote this power by 1 − 5. Repeat Step 4 with progressively increasing values of η until 1 − β ∗ is equal to the desired power 1 − β. At that point denote final the drift parameter by η1 . Then, by (B.6) the desired maximum information Imax is Imax = [ η1 2 ] . δ1 − δ0 We can convert maximum information into maximum sample size or maximum events, depending on the model being used, by selecting the appropriate translation equation from (B.15) for normal endpoints, (B.20) or (B.23) for binomial endpoints, and (B.33) or (B.41) for time to event endpoints. The above iterative procedure produces efficacy and futility boundaries having the property that the probability of crossing the efficacy boundary under the alternative hypothesis δ = δ1 is 1 − β. Thus the desired power is obtained. Next, since the efficacy boundary was computed at Step 1 in the absence of a futility boundary, and was never altered in any subsequent step, the probability of crossing it under the null hypothesis is at most α. This probability is exactly equal to α if the futility boundary is always overruled and can only decrease if the futility boundary is respected at one or more looks. Thus, in either case the type-1 error cannot exceed α. This shows that boundaries constructed as described above produce the desired power and preserve the type-1 error with the added flexibility that the futility boundary can be overruled. Boundaries and Maximum Information for Early Rejection of H1 Only Boundaries for early rejection of H1 only are also known as futility boundaries. They are obtained by only spending the β error at the interim looks according to a β-spending function. The α error is spent in its entirety at the last look. These boundaries and the associated maximum information can therefore be obtained by setting α(tj ) = 0 for j = 1, 2, . . . K − 1 in equations (B.62), (B.64) and (B.66). Equations (B.63), (B.65), (B.66) and (B.67) are unchanged. The computations then proceed as before. B.2 Boundaries and Max Information – B.2.5 Special Considerations 2297 <<< Contents B * Index >>> Group Sequential Design in East 6 B.2.5 Special Considerations for Binomial Designs Maximum information and maximum sample size computations for binomial designs are complicated by the dependence of the variance of a binomial random variable on its mean. Therefore, even if we keep all other design parameters the same, the required maximum sample size for a binomial trial may differ, depending on how we intend to estimate the variance of the treatment difference at the interim monitoring stage. Although this special consideration applies both to superiority trials as well as to non-inferiority trials, the present discussion will be restricted to superiority trials only, where East provides two options at the design stage. The issue is, how will the observed treatment difference δ̂(tj ) = π̂t (tj ) − π̂c (tj ) be standardized at the interim monitoring stage of the trial? The standardization method one intends to use at the interim monitoring stage must be reflected in the computation of sample size at the design stage. In East we offer two options. Unpooled Estimate the variance without pooling the data from the two treatment arms. Thus var[δ̂(tj )] = π̂t (tj )(1 − π̂t (tj )) π̂c (tj )(1 − π̂c (tj )) + , ntj ncj (B.71) which implies that the statistic W̃s (tj ) given by equation (B.24) will be used to monitor the data. We have already seen in Section B.1.2 that this statistic is asymptotically N (0, tj ) under the null hypothesis, asymptotically N (ηtj , tj ) under the alternative hypothesis, and has independent increments. Therefore all the computations discussed in Section B.2 for obtaining stopping boundaries, estimating the maximum information Imax , and converting maximum information into maximum sample size Nmax , remain valid without any modifications. In the East software, the unpooled estimate of variance is the default for the design of binomial endpoint trials. Pooled Estimate the variance after pooling the data from the two treatments. Thus var[δ̂(tj )] = π̂(tj )(1 − π̂(tj )) nj (r)(1 − r) (B.72) where π̂(tj ), the pooled estimate of the binomial response probability at time tj , is given by equation (B.26). This implies that the statistic W̃0s (tj ) given by equation (B.25) will be used to monitor the data. As already stated in Section B.1.2, W̃0s (tj ) is N (0, tj ) with independent increments under 2298 B.2 Boundaries and Max Information – B.2.5 Special Considerations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 H0 : δ = 0. However, since the variance term (B.72) is based on a pooled estimate of response, the distribution of W̃0s (tj ) is no longer N (ηtj , tj ) under the alternative hypothesis. Therefore if we intend to use the pooled estimate of variance at the interim monitoring stage, the computation of Imax and Nmax must be modified for H0 -only boundaries, and the computation of stopping boundaries, Imax and Nmax must be altered for H0 –H1 boundaries. These modifications are described below. First consider the case of H0 -only boundaries. For expository purposes we will only consider boundaries derived from α-spending functions for one-sided tests. The same approach also works for the Haybittle-Peto and Wang-Tsiatis boundaries, and for two-sided tests. Since W̃0s (tj ) ∼ N (0, tj ) with independent increments, the boundaries {c(t1 , c(t2 ), . . . c(tK )} generated by equation (B.56) will preserve the type-1 error without any modification. These boundaries cannot, however, be directly utilized by equation (B.57) because W̃0s (tj ) is not N (ηtj , tj ) under the alternative hypothesis. It is easy to show, however, that asymptotically Pη { K \ W̃0s (tj ) < c(tj )} ≈ Pη { s h= W̃s (tj ) < hc(tj )} (B.73) j=1 j=1 where K \ π̄(1 − π̄)(r−1 + (1 − r)−1 ) πe (1 − πe )r−1 + πc (1 − πc )(1 − r)−1 (B.74) and π̄ = rπe + (1 − r)πc . (B.75) Since W̃s (tj ) is asymptotically distributed as N (ηtj , tj ) with independent increments, we can estimate the maximum information, Imax , by finding the value of η that satisfies the following modification of equation B.57: Pη { K \ W (tj ) < hc(tj )} = β . (B.76) j=1 For H0–H1 boundaries the modification is slightly more complex since in this case the stopping boundaries are not computed independently of Imax . The procedure is identical to the four-step procedure outlined on page 2294 with the following modification: for any equation involving β(tj ) on the right hand side, replace (lj , uj ) by (hlj , huj ). B.2 Boundaries and Max Information – B.2.5 Special Considerations 2299 <<< Contents B * Index >>> Group Sequential Design in East 6 The test statistic W0s (tj ) based on the pooled variance can be transformed into " #2 W0s (tj ) 2 X0s (tj ) = p (B.77) (tj ) which reduces to the familiar Pearson chi-square statistic. The option to base the design on the pooled estimate of variance is being offered because the chi-square test is a popular method for comparing two binomial populations. For a fixed sample study (K = 1) and the sample size obtained by the pooled approach specializes to the following formula given by Lachin (1981): " p #2 p zα π̄(1 − π̄)(r−1 + (1 − r)−1 ) + zβ πe (1 − πe )r−1 + πc (1 − πc )(1 − r)−1 N= δ1 (B.78) In contrast the sample size for a fixed sample design based on the unpooled estimate of variance is 2 zα + zβ N = [πe (1 − πe )r−1 + πc (1 − πc )(1 − r)−1 ] × . (B.79) δ1 We shall show in the next section that when K > 1 the above sample sizes are multiplied by an appropriate inflation factor that takes into account the number of looks, K, as well as the type of stopping boundary. For balanced designs (r ≈ 0.5) the maximum sample sizes for the pooled and unpooled methods are very similar. If, however, the design is severely unbalanced, there can be substantial differences in the maximum sample sizes required to attain the desired power. It follows from equations (B.73) and (B.74) that if h < 1, the pooled variance will produce a more powerful test than the unpooled variance, whereas if h > 1, the unpooled variance will produce a more powerful test than the pooled variance. We have illustrated these points with examples in Chapter 23. B.3 The Inflation Factor B.3.1 General Designs B.3.2 Information Based Designs B.3.3 G versus I Designs It should be clear from the manner in which the drift parameter was computed in the previous section that its value depends on K, α, β and the stopping boundary or spending function selected for the design. Therefore, in this section we will recognize explicitly that drift parameter is a function of these items by denoting it as η(α, β, K, boundaries). The relationship Imax = [ 2300 B.3 The Inflation Factor η1 (α, β, K, boundaries) 2 ] δ1 − δ0 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 implied by equation (B.6) is equivalent to Imax = [ zα + zβ 2 η1 (α, β, K, boundaries) 2 ] ×[ ] . δ1 − δ0 zα + zβ (B.80) Observe that the first term in equation (B.80) is the information needed to achieve 1 − β power at an effect size of δ1 for a single-look, one-sided, level-α study of the null hypothesis δ = δ0 . We denote this term by I1 = [ zα + zβ 2 ] . δ1 − δ0 (B.81) The second term is a multiplier for inflating the information required by the single-look study so that it will preserve the desired power of 1 − β if K > 1 looks are taken. We refer to the second term as the inflation factor and denote it by IF(α, β, K, boundaries) = [ η1 (α, β, K, boundaries) 2 ] zα + zβ (B.82) If we denote Imax byIK for a K-look group sequential study, we have the relationship IK = I1 × IF(α, β, K, boundaries) . (B.83) In Table B.1 we have tabulated the inflation factors for some common choices of α, β, K and the shape parameter, ∆ for the Wang-Tsiatis boundaries. Table B.1: Inflation Factors for Pocock (∆ = 0.5) and O’Brien-Fleming (∆ = 0) Stopping Boundaries K 2 2 3 3 4 4 5 5 B.3.1 α = 0.05 (two-sided) Stopping Power (1 − β) Boundary 0.80 0.90 0.95 Pocock 1.11 1.10 1.09 O-F 1.01 1.01 1.01 Pocock 1.17 1.15 1.14 O-F 1.02 1.02 1.02 Pocock 1.20 1.18 1.17 O-F 1.02 1.02 1.02 Pocock 1.23 1.21 1.19 O-F 1.03 1.03 1.02 K 2 2 3 3 4 4 5 5 α = 0.01 (two-sided) Stopping Power (1 − β) Boundary 0.80 0.90 0.95 Pocock 1.09 1.08 1.08 O-F 1.00 1.00 1.00 Pocock 1.14 1.12 1.12 O-F 1.01 1.01 1.01 Pocock 1.17 1.15 1.14 O-F 1.01 1.01 1.01 Pocock 1.19 1.17 1.16 O-F 1.02 1.01 1.01 Role of Inflation Factors in General Designs The inflation factor is a convenient device for converting fixed sample studies into B.3 The Inflation Factor – B.3.1 General Designs 2301 <<< Contents B * Index >>> Group Sequential Design in East 6 corresponding group sequential studies. This is the basis of the General design module. In this module East accepts the sample size (or information) required for a single-look study with a given power and type-1 error. East then uses the appropriate inflation factor to convert the single-look study into a K-look group sequential study. This is useful when we are required to design a group sequential study to evaluate some end point that is not currently available directly in East. (For example, the end point might be the comparison of two Poisson rates, or it might be a covariate in a logistic regression model). The first step is to obtain the sample-size or statistical information that would be required if this were a fixed-sample study. This can be done with the help of any convenient non-sequential design package. The sample size so obtained is then inflated by the appropriate inflation factor based on the desired number of looks, significance level, power and stopping boundary desired for the group sequential trial. See Chapter 60 for examples where East designs and monitors general studies of this type. B.3.2 Role of Inflation Factor in Information Based Designs Suppose one wishes to test H0 : δ = δ0 versus H1 : δ = δ1 where δ is a scalar parameter of interest in some mathematical model of the data generating process. In the Information Based Design module of East one specifies δ1 − δ0 . East then computes the required fixed sample information through equation (B.81) and inflates it appropriately for a K look group sequential study through equation (B.82). The information is expressed in the dimensionless units of [se(δ̂(τ ))]−2 . The study is then monitored on this information scale. Designing a study so that the information will be in the dimensionless units of [se(δ̂(τ ))]−2 has both advantages and disadvantages. The disadvantage is that, prior to activating the study, one needs to interpret the desired information in terms of a physical resource like sample size or number of failures. The formula for making this conversion depends on the mathematical model of the data generating process. Sometimes a closed-form formula exists, but for more complex models one must resort to simulation. (See, for example, Scharfstein, Tsiatis and Robins, 1997, or Scharfstein and Tsiatis 1998.) Additionally, the conversion usually depends on initial estimates of nuisance parameters like the baseline response rate or the other covariates in the mathematical model of the data generating process. If we estimate the values of these nuisance parameters incorrectly, the sample size (or other physical resource) too will be incorrect and the study will not have the operating characteristics it was intended to have. For this reason it is often preferable to design the study and implement the interim monitoring on the dimensionless information scale where we do not require any knowledge about the nuisance parameters. Provided we continue to monitor the data until either full information IK is achieved (in terms of the desired [se(δ̂(τ ))]−2 ), 2302 B.3 The Inflation Factor – B.3.2 Information Based Designs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 or else a stopping boundary is crossed, we are assured of preserving the operating characteristics of the design. We might of course wish to update the conversion of the desired statistical information into a physical resource like sample size at each interim monitoring time point, as revised estimates of the nuisance parameters become available. Illustrative examples in which the value of IK remains constant on the dimensionless information scale but changes on the sample size scale, as more accurate estimates of nuisance parameters are obtained, are given in Chapter 59. B.3.3 Selecting the General versus the Information Based Option The General (G) and the Information Based (I) modules in East both rely on the same general distribution theory developed by Scharfstein, Tsiatis and Robins (1997) and by Jennison and Turnbull (1997). In both cases an inflation factor is applied to a corresponding fixed sample design as discussed at the begining of this section. The question then arises as to which module to adopt for a given problem. Here are some guidelines. 1. If software to design the corresponding single-look study is available, the G module is easier to use than I module since the information is measured in terms of a physical resource like sample size or number of events. 2. If software to design the corresponding single-look study is not available, the I module can still be used since it only requires one to input the size of the treatment effect δ1 under the alternative hypothesis. The I module, however, specifies the maximum required information in terms of a dimensionless quantity representing the inverse square of the standard error of the parameter being tested. It is usually necessary to translate this dimensionless information into a physical resource, either through simulation or analytically. 3. The I module is preferable to the G module in situations where the model for generating the data contains unknown nuisance parameters like the variance, the baseline response, or the coefficients of covariates in a regression model. To use the G module one would have to make assumptions about these unknown nuisance parameters. But the I module only requires you to specify the magnitude of the treatment effect you are interested in detecting. 4. The I module facilitates sample-size re-estimation since the maximum information is specified in dimensionless units that remain constant while the translation of maxmium information into the corresponding sample size can be made more accurate at each interim look as increasingly accurate estimates of nuisance parameters become available. B.3 The Inflation Factor – B.4.3 G versus I Designs 2303 <<< Contents B B.4 * Index >>> Group Sequential Design in East 6 Computation of Expected Information B.4.1 Exit Probabilities B.4.2 Expected Sample Size B.4.3 Expected Events In Section B.1 we defined the maximum information, Imax ≡ IK , to be committed up-front for a K look group sequential clinical trial, and in Section B.2 we showed how to compute this quantity for various stopping boundaries. In practice of course a group sequential study might be terminated earlier than the Kth look because of the sequential monitoring. Thus the actual information is a random variable. In this section we show how to compute the probability of crossing the stopping boundaries at each interim look. We then derive from these exit probabilities, the expected value of the information that will be obtained in a group sequential clinical trial. For normal and binomial end points, information will be represented by sample size. For time to failure end points, information will be represented by the number of failures. B.4.1 Boundary Crossing Probability at Each Look Let u1 , u2 , . . . uK be the upper stopping boundaries for a one-sided group sequential test with possible early rejection of H0 . The probability of boundary crossing for the first time at look j is Pbc,j = Pη [W (t1 ) < u1 , W (t2 ) < u2 , · · · , W (tj ) > uj ] . East computes and displays this boundary crossing probabilities under both H0 , H1 and H1/2 (half way between the null and alternative hypotheses) for all j = 1, 2, . . . K. Similarly for one-sided tests allowing for early to reject either H0 or H1 , East computes and displays Pbc,j = Pη [l1 < W (t1 ) < u1 , · · · , lj−1 < W (tj−1 ) < uj−1 , W (tj ) > uj ] + Pη [l1 < W (t1 ) < u1 , · · · , lj−1 < W (tj−1 ) < uj−1 , W (tj ) < lj ] under H0 ,H1 and H1/2 . Similar displays are also available for the two-sided tests. For two-sided tests we simply replace W (tj ) with |W (tj )| in the above boundary crossing probability equations. B.4.2 Expected Sample Sizes for Normal and Binomial Studies In general, for a study with K interim analyses performed at information fractions t1 , t2 , . . . tK , the expected stopping time, Et , can be computed under various hypotheses on the basis of the boundary crossing probabilities as follows: Et = K−1 X j=1 2304 tj × Pbc,j + 1 − K−1 X Pbc,j . j=1 B.4 Expected Information – B.4.2 Expected Sample Size <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The expected sample size is then computed as EN = Nmax Et , (B.84) where Nmax is the projected maximum sample size, evaluated as discussed at the end of each subsection of Section B.2. The expected value EN is referred to as ASN or average sample number in some of the East charts. B.4.3 Expected Number of Events for Survival Studies We have shown in each subsection of Section B.2 that the value of Imax needed to detect δ = δ1 with 1 − β power is given by Imax = [ η1 2 ] . δ1 − δ0 (B.85) Furthermore, equations (B.33) and (B.41) show that Imax is directly proportional to Dmax , the maximum number of events. It follows that 2 η1 1 Dmax = , (B.86) (r)(1 − r) δ1 − δ0 where δ0 and δ1 are specified by the null and alternative hypotheses, r is the proportion randomized to treatment T under the alternative hypothesis, and η1 is computed along with the stopping boundaries as discussed in Section B.2. The expected number of events is thus ED = Dmax Et , (B.87) B.5 Sample Size and Expected Study Duration for Survival Studies Equation (B.86) shows that the power of a time-to-event trial is determined by the maximum number of events, Dmax , rather than the maximum sample size, Nmax . However, the total time one must wait for the Dmax events to arrive can be controlled through sample size. The larger the sample size, the faster the required number of events are expected to arrive. A typical time-to-event trial is characterized by an accrual phase during which new subjects are enrolled, and a follow-up phase during which there is no further enrollment but subjects continue to be followed until the required number of events have been observed. A longer accrual phase implies a shorter follow-up phase, and usually also implies a shorter total study duration. Kim and Tsiatis (1990) analyzed this trade off for the simplest possible case in which subjects enroll at a constant rate a for a fixed period Sa , there are no drop-outs, the survival distributions for the two treatment arms are exponential, and all subjects who have not yet experienced the event are followed until the trial is terminated. For this B.5 Sample Size and Study Duration 2305 <<< Contents B * Index >>> Group Sequential Design in East 6 special case they calculated that the expected number of events at calendar time l given a constant hazard rate of λ is ( −λl a[l − 1−eλ ] if l ≤ Sa E(l|a, Sa , λ) = (B.88) e−λl λSa a[Sa − λ (e − 1)] if l > Sa In Appendix D we have generalized (B.88) to handle variable enrollment rates, drop-outs, and piece-wise exponential survival, for both the variable follow-up setting, where the each subject still on-study is followed until the trial ends, and the fixed follow-up setting, where each subject still on-study is followed for a fixed amount of time, m. The generalized expression is denoted here by E(l|a, Sa , λ, γ, m) = expected number of events at calendar time l given a: a vector of enrollment rates for different intervals in the enrollment phase; Sa : a vector of enrollment durations corresponding to the components of a; λ: a vector of hazard rates for piece-wise exponential survival; γ: a drop-out rate for subjects lost to follow-up; m: a fixed follow-up time for each subject (m = ∞ denotes variable follow-up). Thus, if the fraction randomized to the treatment arm is r, the expected number of events from both arms together at calendar time l is E(1) (l|a, Sa , λ, γ, m) = rE(l|a, Sa , λT , γT , m) + (1 − r)E(l|a, Sa , λC , γC , m) , (B.89) where λ = (λE , λC ) and γ = (γE , γC ). A chart displaying E(l|a, Sa , λC , γ, m), E(l|a, Sa , λT , γ, m) and E(1) (l|a, Sa , λ, γ, m) versus calendar time l can be displayed by clicking on the East’s Library. B.5.1 icon in the Plots menu of Estimating Maximum Expected Study Duration In a K-look group sequential trial we are committed to keeping the trial open until Dmax events are observed, unless a stopping boundary is crossed earlier. Although the actual calendar time at which these Dmax events will occur is a random variable it is nevertheless useful, for design purposes, to compute the calendar time, lmax say, at which we would expect to observe Dmax events under various assumptions about accrual, drop-outs and survival distributions. Therefore we solve for lmax from the 2306 B.5 Sample Size and Study Duration – B.5.1 Maximum Study Duration <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 equation E(1) (lmax |a, Sa , λ, γ, m) = Dmax . (B.90) The solution to lmax in the above equation is obtained by iteration and represents the maximum length of time that we would expect the study to remain open if no early stopping boundary was crossed. B.5.2 Trading-Off Maximum Study Duration Against Sample Size We now establish a trade-off between the maximum expected study duration lmax and sample size. In order to present the essential features of this trade-off, we will only discuss the special case where enrollment is constant at a rate a per unit time and the duration of the enrollment phase is Sa . In Appendix D we show that East does indeed handle the more general case in which there are Q distinct enrollment rates a = (a1 , a2 , . . . aQ ) with corresponding enrollment durations denoted by Sa = (Sa1 , Sa2 , . . . SaQ ). However a detailed discussion of the general case in the present section would be a distraction. It involves more complex notation and would needlessly prolong the discussion without providing any additional insight about the trade-off involved. Case (i): Variable Follow-Up Design (m = ∞) In this design subjects are enrolled for Sa units of time. All subjects are followed until the trial ends, unless they drop out or achieve the endpoint before trial termination. Thus the first subject enrolled could potentially be followed for lmax units of time while the last subject enrolled could potentially be followed for lmax − Sa units of time. We may express the maximum expected study duration in the form lmax = Sa + Sf (B.91) where Sf is the duration of the follow-up phase of the trial. Then, for a fixed value of Sa , East determines the value of Sf such that E(1) (Sa + Sf |a, Sa , λ, γ, ∞) = Dmax . (B.92) (Observe that the symbol m = ∞ has been used in the above expression for E(1) (.|a, Sa , λ, γ, m), thereby indicating that this is a variable follow-up design.) By entering different enrollment durations, Sa , into equation (B.92) one obtains corresponding follow-up durations, Sf , and hence also obtains corresponding maximum study durations, lmax = Sa + Sf . Graphs of lmax versus study duration, Sa , B.5 Sample Size and Study Duration – B.5.2 Study Duration versus Sample Size 2307 <<< Contents B * Index >>> Group Sequential Design in East 6 and lmax versus sample size, aSa , are obtained by clicking on the Plots menu of East’s Library. A typical pair of graphs is shown below: icon in the The red line on the graph displays the maximum expected study duration, lmax , versus enrollment duration Sa The calculation of lmax is done under the alternative hypothesis and does not take into consideration a possible shortening of the study duration caused by the early stopping. The blue graph on the graph displays the expected study duration under the alternative hypothesis that accounts for the possibility of early stopping. Similar graphs can be obtained for lmax (or its expectation under H1 ) versus the sample size aSa . All these relationships are monotone decreasing, highlighting that the greater the duration of the enrollment phase, or number of patients enrolled, the shorter the follow up phase and hence the shorter the total expected study duration. We can establish a range of acceptable enrollment durations, (Sa,min , Sa,max ), as well as a range of corresponding acceptable sample sizes (aSa,min , aSa,max ) within which it is reasonable to make a selection. To determine Sa,max we argue that it is not necessary to prolong the enrollment phase beyond the time required to obtain Dmax events. Thus we search iteratively for the value of l at which E(1) (l|a, Sa = l, λ, γ, ∞) = Dmax 2308 B.5 Sample Size and Study Duration – B.5.2 Study Duration versus Sample Size (B.93) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sa,max , is the value of l that solves equation (B.93). To determine Sa,min we start with the smallest possible enrollment duration Sa∗ = Dmax /a and see if it is feasible. To determine feasibility we progressively increase the follow-up time, starting with Sf = 0, and compute E1 (Sa∗ + Sf |a, Sa∗ , λ, γ, ∞). If it should turn out that no matter how large we make Sf we always have E(1) (Sa∗ + Sf |a, Sa∗ , λ, γ, ∞) < Dmax , then the current value of Sa∗ is not feasible. In that case we increase the enrollment duration by a small amount . After setting Sa∗ ← Sa∗ + , we once again test for feasiblity by computing E1 (Sa∗ + Sf |a, Sa∗ , λ, γ, ∞) with progressively increasing values of Sf . We iterate in this manner until we finally obtain the smallest possible Sa∗ , denoted by Sa,min , such that there exists a value of Sf at which E1 (Sa∗ + Sf |a, Sa∗ , λ, γ, ∞) = Dmax (B.94) The solution, Sa,min , is the smallest that we can make the enrollment period and still hope to obtain Dmax events. If there are no drop-outs, Sa,min = Dmax a , but in the Dmax presence of drop-outs, Sa,min > a . East displays the enrollment duration range (Sa,min , Sa,max ) (as well as corresponding sample size range (aSa,min , aSa,max )), and selects the mid-point of this range as the default. The user can change this default value and thereby trade-off sample size against total study duration. Case (ii): Designs with Fixed Follow-Up m In many trials the clinical endpoint is of interest only if it is obtained within a fixed time period m. For example, in trials involving acute coronary syndrome, the primary question is whether the clinical endpoint (e.g., death, MI or refractory ischemia) has occured within m = 30 days of of entry into the study. In such trials each subject is only followed for a maximum of m units of time, after which the subject goes off-study. Therefore the maximum study duration is actually fixed at m units after the B.5 Sample Size and Study Duration – B.5.2 Study Duration versus Sample Size 2309 <<< Contents B * Index >>> Group Sequential Design in East 6 last subject has been enrolled; i.e., at time Sa + m. The design question is to determine the value of Sa that will ensure that E(1) (Sa + m|a, Sa , λ, γ, m) = Dmax . (B.95) Here m is fixed and we iterate on Sa until equation (B.95) is satisfied. Denote this solution by Sa,min . In this case if we enroll aSa,min subjects and follow each subject for a maximum of m units of time we expect to obtain the desired Dmax events at time lmax = Sa,min + m . If we enroll for longer than Sa,min units of time then the desired Dmax events are expected to arrive before Sa,min + m. In particular if the duration of enrollment extends up to Sa,max units of time, where E(1) (Sa,max |a, Sa,max , λ, γ, m) = Dmax , (B.96) then the desired Dmax events will have arrived by the end of the enrollment phase itself. Therefore East specifies that (Sa,min , Sa,max ) is an acceptable range within which to select the enrollment duration and (aSa,min , aSa,max ) is an acceptable range within which to select the corresponding sample size. Unlike the variable follow-up case where the mid-point of the range is selected as the default, East selects Sa,min (aSa,min ) as default choice for the enrollment duration (sample size). With this choice, the trial is expected to be fully powered precisely when the last subject enrolled has been followed for m units of time. B.5.3 Choice of Variance for Survival Studies The maximum amount of Fisher information needed to achieve the desired power is shown in Section B.2 to be η1 2 Imax = [ ] . δ1 − δ0 Equation (B.33) relates the required Fisher information, Imax , to the required number of events, Dmax by noting that Imax ≈ var[S(lK )] = r(1 − r)Dmax . 2310 B.5 Sample Size and Study Duration – B.5.3 Choice of Variance (B.97) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In the above expression the var[S(lK )] has been evaluated under the null hypothesis, leading to the result 2 1 η1 Dmax = , (B.98) (r)(1 − r) δ1 − δ0 An alternative approach would be to estimate of the variance of S(lK ) under the alternative hypothesis. In that case Imax ≈ var[S(lK )] = (pK )(1 − pK )Dmax (B.99) where pK is the proportion of of the Dmax events that occur in the experimental group. This leads to the result 2 η1 1 , (B.100) Dmax = (pK )(1 − pK ) δ1 − δ0 Since under local alternatives pK converges to r the above two expressions for var[S(lK )], and hence for Dmax , are asymptotically equivalent. In small samples, however, the two ways of computing Dmax can lead to different results, especially if the randomization fraction is not equal to 0.5. East provides the user with the option to use either the null variance or the alternative variance for evaluating Dmax on the Design Parameters tab. The evaluation of pK for use in equation (B.100) is an iterative process. For any given a and Sa , we proceed through the following steps: 1. Initialize pK = r 2. Compute Dmax 2 η1 1 = (pK )(1 − pK ) δ1 − δ0 3. Find the value of lK such that E(1) (lK |a, Sa , λ, γ, m) = Dmax B.5 Sample Size and Study Duration – B.5.3 Choice of Variance 2311 <<< Contents B * Index >>> Group Sequential Design in East 6 4. With this value of lK compute E(lK |a, Sa , λT , γT , m), E(lK |a, Sa , λC , γC , m) and Dmax (new) = E(lK |a, Sa , λT , γT , m) + E(lK |a, Sa , λC , γC , m) 5. Compute pK = E(lK |a, Sa , λT , γT , m) Dmax (new) 6. Return to step 2 We iterate steps 2 through 5 until the value of Dmax stabilizes. Note: This refinement for computing Dmax is only available for superiority trials. For non-inferiority trials p(lK ) = r under the alternative hypothesis, and equation (B.98) may be used with no modification. 2312 B.5 Sample Size and Study Duration <<< Contents * Index >>> C Interim Monitoring in East 6 The primary characteristic of the interim monitoring phase in East is flexibility. At the design phase East obtains the stopping boundaries and the maximum statistical information by assuming that the study will be monitored a total of K times, after pre-specified increments of information. Provided the study adheres strictly to its planned schedule of interim monitoring, it is assured of maintaining the desired type-I error and power. It is, however, administratively inconvenient to fix in advance the number and timing of the interim looks. For instance, it might be necessary to set the dates of the interim monitoring looks so as to accommodate the schedule of a data and safety monitoring board (DSMB). Typically, the DSMB would plan to meet after equal increments of calendar time, which would not necessarily coincide with the information fractions specified at the design stage. Again, it might be necessary to alter K, the planned number of looks at the data, either for safety reasons, because the accrual assumptions were not met, or for some other administrative reason. These alterations to the original plan could change the operating characteristics of the study unless suitable adjustments were made in the interim monitoring phase. East makes the necessary adjustments by implementing the error spending function methodology first proposed by Lan and DeMets (1983) for studies that stop early to reject H0 , and subsequently generalized by Pampallona, Tsiatis and Kim (1995), (2000), to studies that stop early to reject either H0 or H1 . This appendix chapter covers all the key components of the interim monitoring module in East. The following topics are discussed: Flexibility to alter the number and timing of the interim monitoring time-points through the error spending function methodology while preserving the type-1 and type-2 errors (Section C.1). Measuring the impact that deviations from the number and timing of the interim monitoring time-points specified at the design phase have on the post-hoc power of the study (Section C.2). Conditional power calculations aimed at assisting in the decision to stop early due to futility (Section C.3). Repeated confidence intervals that provide the desired coverage for the primary parameter of interest despite the multiple looks (Section C.5). Inference at the end of a group sequential trial (Section C.6). Sequential monitoring from any general data generating process, not necessarily the normal, binomial or time to failure models that are supported directly by East (Section C.7). The ability to monitor on a dimensionless information scale and thereby facilitate sample size recalculation (Section C.8). 2313 <<< Contents C C.1 * Index >>> Interim Monitoring in East 6 Flexible Interim Monitoring C.1.1 Monitoring with Alpha Functions C.1.2 Monitoring with Alpha and Beta Functions The boundary and maximum information computations at the design phase were performed under the assumption that the number and spacing of the interim looks are known in advance. In practice this assumption is unrealistic. A major goal of a practical interim monitoring strategy is to give the user flexibility to monitor the data at arbitrary time points at the interim monitoring stage, possibly perform one or more unplanned analyses, possibly drop one or more planned analyses, and still preserve the type-1 error of the study design. This flexibility is achieved through the spending function approach as originally introduced by Lan and DeMets (1983). If the boundaries at the design stage were themselves derived from spending functions (as discussed in Section B.2.4), one simply uses the same spending functions to re-compute the boundaries at any arbitrary interim monitoring time point. If, however the boundaries constructed at the design stage belong to the Wang-Tsiatis family (Section B.2.2) or the Pampallona-Tsiatis family (Section B.2.3) they are re-computed by inverting special ten-look error spending functions that capture the spirit of these boundaries. (The construction of these ten-look error spending functions is described in detail in Appendix F.) C.1.1 Monitoring with α-Spending Functions Suppose the clinical trial was designed for early stopping to reject H0 . Let α(t) denote its α-spending function. Suppose that the study was originally planned for up to K looks at the accumulating data, at the interim monitoring fractions t1 , t2 , . . . tK . Stopping boundaries c1 , c2 , . . . cK have already been generated on this basis using the methods discussed in Section B.2.4. If the study is monitored strictly according to plan these same stopping boundaries may be used to make early stopping decisions. If, however, one deviates from the plan, the original stopping boundaries are no longer valid and new boundaries have to be computed on the fly at each interim monitoring time point to reflect the amount of type-1 error that has actually been spent. Suppose, for example, that the first time we monitor the data, the information fraction is t01 6= t1 . We then re-compute the first boundary value c01 such that, under the null hypothesis of no treatment difference (H0 ), P0 (W (t01 ) ≥ c01 ) = α(t01 ) . If we do not stop the study at the first interim test, then the data are monitored a second time. Suppose the second monitoring takes place at information fraction t02 6= t2 . At this stage, we are allowed to use up a total of α(t02 ) of the significance level. Since we already used α(t01 ) at the first look, we then compute the next boundary value c02 so that α(t01 ) + P0 (W (t01 ) < c01 , W (t02 ) ≥ c02 ) = α(t02 ) . This guarantees that the probability of stopping and rejecting at the first or second monitoring, under H0 , will be α(t02 ). In general we compute the boundary c0j at 2314 C.1 Flexible Monitoring – C.1.1 Monitoring with Alpha Functions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 information fraction t0j ≤ 1 by solving equation α(t0j−1 ) + P0 (W (t01 ) < c01 , · · · , W (t0j−1 ) < c0j−1 , W (t0j ) ≥ c0j ) = α(t0j ) . (C.1) If it should happen that Ij , the information accrued by look j, exceeds Imax , the maximum information stipulated at the design stage, so that t0j = Ij /Imax > 1, East will set α(t0j ) = α and force the jth look to be the last one. Thus, since α(t0j ) ≤ α for any j and any information fraction t0j , this procedure guarantees that the probability under H0 of ever crossing the upper boundary can never exceed α. Therefore this flexible interim monitoring procedure always preserves the overall type-1 error. We should note that the α-spending procedure does not guarantee that the type-2 error will be preserved. However, Lan and DeMets (1983) have shown that these procedures, even with few monitoring times, will yield statistical properties similar to those expected with continual monitoring. The operating characteristics and early stopping properties of sequential tests would not be very different whether you monitored the data 5 times, 10 times, or continually. For this reason, once the spending function is specified, we are free to either monitor the accumulating data continually, monitor after equal increments of calendar time, monitor after equal information fractions, or monitor sporadically, without any significant change in the type-2 error. The post-hoc power chart displayed by East (see Section C.2) shows that as long as a study reaches its accrual goals its power is affected minimally even if the interim monitoring schedule differs from what was planned at the design stage. C.1.2 Monitoring with α- and β -Spending Functions The spending function approach of Lan and DeMets (1983) was developed in the context of designs that do not allow for early stopping with rejection of the alternative hypothesis. Rejection of the alternative hypothesis could only occur at the last look. Thus, in the initial approach of Lan and DeMets (1983), whereas the type-1 error was spent in accordance with a spending function α(t), the type-2 error had probability exactly equal to zero at all looks except the last, where it had the desired probability, β. However, when sequential designs are constructed in terms of an upper boundary for early rejection of the null hypothesis and a lower boundary for early rejection of the alternative hypothesis, then the total probability of the type-2 error, β, can also be distributed over successive looks. The rate at which the error probability is to be spent can be described by an appropriate strictly increasing function of the information time. Let β(t) denote this function such that β(0) = 0 and β(1) = β. The design of trials that spend α and β simultaneously and stop early to reject either H0 or H1 has already been described in Section B.2.4. Suppose we have designed such a trial for up to K monitoring time points at the information fractions t1 , t2 , . . . tK . For the one-sided test, let lj and uj be the values of the lower and upper boundaries, respectively, at the j th look, j = 1, 2, . . . K. C.1 Flexible Monitoring – C.1.2 Monitoring with Alpha and Beta Functions 2315 <<< Contents C * Index >>> Interim Monitoring in East 6 Now suppose we are about to monitor the trial and no longer wish to adhere to either the number or timing of the interim looks specified at the design stage. Pampallona, Tsiatis and Kim (1995) have suggested the following adaptation of the Lan and DeMets (1983) procedure for flexible interim monitoring while simultaneously preserving both the type-1 error and type-2 errors of the study. Suppose that we monitor the data for the first time at information fraction t01 6= t1 . Then we would compute the first pair of boundary values, (l10 , u01 ), so as to satisfy P0 (W (t01 ) ≥ u01 ) = α(t01 ) and Pη (W (t01 ) ≤ l10 ) = β(t01 ) where η, the drift parameter, has been computed at the design stage along with the upper and lower stopping boundaries as described in Section B.2.4. Similarly the boundary values, (lj0 , u0j ), at subsequent looks, j ≥ 2, will have to satisfy 0 α(t0j−1 ) + P0 (l10 < W (t01 ) < u01 , · · · , lj−1 < W (t0j−1 ) < u0j−1 , W (t0j ) ≥ u0j ) = α(t0j ) (C.2) and 0 β(t0j−1 )+Pη (l10 < W (t01 ) < u01 , · · · , lj−1 < W (t0j−1 ) < u0j−1 , W (t0j ) ≤ lj0 ) = β(t0j ) . (C.3) If it should happen at some look, j ∗ say, that Ij ∗ > Imax , so that t0j ∗ = Ij ∗ /Imax > 1, East will set α(t0j ∗ ) = α and force the jth look to be the last one. The upper boundary, uj ∗ , will then be computed as the solution to 0 α(t0j−1 )+P0 (l10 < W (t01 ) < u01 , · · · , lj−1 < W (t0j ∗ −1 ) < u0j ∗ −1 , W (t0j ∗ ) ≥ u0j ∗ ) = α . (C.4) Since we require the stopping boundaries to meet at the last look, it will not be necessary to compute lj ∗ , the lower boundary at the last look. Instead we will simply set lj ∗ = uj ∗ . In that case the probability of crossing the lower boundary at the last look or earlier is evaluated by computing β ∗ = β(t0j−1 )+Pη (l10 < W (t01 ) < u01 , · · · , lj0 ∗ −1 < W (t0j ∗ −1 ) < u0j ∗ −1 , W (t0j ∗ ) ≤ u0j ∗ ) . (C.5) Since the right hand sides of equations (C.2) and (C.4) can never exceed α this procedure guarantees that the probability under H0 of ever crossing the upper 2316 C.1 Flexible Monitoring – C.1.2 Monitoring with Alpha and Beta Functions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 boundary can never exceed α. Therefore this flexible interim monitoring procedure always preserves the overall type-1 error. In its present form, however, this procedure is not guaranteed to preserve the type-2 error because β ∗ , evaluated by equation (C.5), could in principle exceed β. In order to ensure that the type-2 error is always preserved we need to position the last look in such a way that β ∗ ≤ β. The optimal positioning of the last look, to be discussed in Section C.2.3, will ensure this. C.2 Post-Hoc Power and Preservation of Error C.2.1 C.2.2 C.2.3 C.2.4 Last-Look Boundary Computation Optimal Last Look Post-Hoc Power Chart In Section C.1 we developed the error spending function methodology for preserving the type-1 error, despite deviations from the number and timing of the interim looks specified at the design phase of the study. While the type-1 error is indeed preserved by this methodology, it is possible for the alterations in the interim monitoring schedule to affect the type-2 error (hence the power) of the study. Thus it is helpful to compute the post-hoc power at the end of the study, taking into account the actual number and timing of the interim looks. For instance, we would not be too concerned about the impact of the alterations in the interim monitoring schedule if the study was designed for 90% power and the post-hoc power turned out to be 89.5%. This section shows how such post-hoc calculations can be performed. As a by-product we generate a power chart in which, under the assumption that the next look will be the last one, the relationship of post-hoc power to the final statistical information is plotted. The optimal placement of the last look (on the statistical information scale) so as to achieve the power specified at the design phase, is thus obtained. This provides us with a strategy for preserving power by altering the information horizon. Although all the calculations in this section are derived for one-sided tests, they can be readily extended to the two-sided setting by replacing W (tj ) with |W (tj )|. Note that the post-hoc power calculations in this section differ from the conditional power calculations in Section C.3. Post-hoc power calculations utilize the placement on the information scale of the interim looks already taken, while conditional power calculations utilize, in addition, the current value of the test statistic. Also, the post-hoc power chart is plotted as a function of statistical information whereas the conditional power chart is plotted as a function of the standardized treatment difference. C.2.1 Boundary Derivation if the Next Look is the Last Suppose a study has been active for a while, accruing information without the test statistic crossing the stopping boundary at any of the interim monitoring time-points. Eventually, however, the decision must be taken to make the next analysis the last one regardless of the value of the test statistic. As a practical matter it is very unlikely that this last analysis can be performed at the precise time-point that the planned maximum information is attained. In some cases the actual information will exceed the planned maximum and in other cases it will fall short. Some studies may even need to be C.2 Post-Hoc Power and Preservation of Error – C.2.1 Last-Look Boundary 2317 <<< Contents C * Index >>> Interim Monitoring in East 6 closed prematurely for administrative reasons, like poor accrual or withdrawal of the drugs under investigation. In all such cases the information fraction tL 6= 1 , where L indexes the last analysis. This situation brings up two issues: 1. The boundary for the last look should be computed by spending the balance of the type-I error probability, namely α − α(tL−1 ), in order for the group sequential test to have the desired size α . 2. The power of the adopted sequential procedure usually won’t equal the desired, 1 − β , due to the probable departure of the sequence of analyses actually performed from the analyses assumed at the design stage. For designs allowing for early stopping only to reject the null hypothesis we compute uL , the boundary for the L-th look, by satisfying the following equation (here given for one-sided tests): α(tL−1 ) + P0 (W (t1 ) < u1 , W (t2 ) < u2 , · · · , W (tL−1 < uL−1 , W (tL ) ≥ uL ) = α . (C.6) For designs allowing for early stopping in favor of either the null or the alternative the last-look upper stopping boundary, uL , (which must equal the last-look lower stopping boundary, lL ) is obtained by satisfying the following equation (here given for one-sided tests): α(tL−1 )+P0 (l1 < W (t1 ) < u1 ), · · · , lL−1 < W (tL−1 ) < uL−1 , W (tL ) ≥ uL ) = α . (C.7) In either case, however, the achieved overall power of the procedure probably won’t be what was specified at design time because of deviations from the planned number and timing of the interim analyses. Therefore East computes “post-hoc power” to quantify the power actually achieved by the adopted analysis strategy. This is discussed next. C.2.2 Calculating Post-Hoc Power As stated previously, it is highly unlikely that the actual number and timings of the interim analyses will match the K equally spaced analyses assumed at the design stage, and this discrepancy affects the power of the sequential testing procedure. It might be of interest to know what the real power of the study was, based on the actual interim monitoring time-points rather than the assumed ones, even though we can only perform this power calculation post-hoc. If the post-hoc power is reasonably close to the planned power despite deviations in the interim monitoring schedule, one can feel satisfied that the study preserved its original operating characteristics. If the study is designed for a one-sided test with early stopping to only reject H0 , East 2318 C.2 Post-Hoc Power and Preservation of Error – C.2.2 Computation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 computes the post-hoc power (PHP) from the following equation. PHP = 1 − Pη [W (t1 ) < u1 , W (t2 ) < u2 , · · · , W (tL−1 ) < uL−1 , W (tL ) < uL ] (C.8) where uL is the boundary used at the last look to satisfy the type-I error probability, as specified by equation (C.6). Similarly, in the case of a one-sided test allowing for early stopping to reject either H0 or H1 , the post-hoc power becomes Pη [W (t1 ) < l1 ] + Pη [l1 < W (t1 ) < u1 , W (t2 ) < l2 ] + · · · PHP = 1 − (C.9) · · · + Pη [lL−1 < W (tL−1 ) < uL−1 , W (tL ) < uL ] where uL = lL is the boundary used at the last look to satisfy the type-I error probability, as computed by equation (C.7). C.2.3 Optimal Placement of Last Look Suppose that, in the course of the interim monitoring, it was decided to make the next look the last one, regardless of the current interim monitoring time-point. Where should that last look be positioned? To answer this question consider that we designed the sequential test for a type-1 error of α and a power of 1 − β. The discussion in Section C.2.1, ensures that the overall type-1 error will indeed be α no matter where we position the last look. The deviations from the planned schedule of interim monitoring imply, however, that if we take the last look at the time point specified in the original design, the power of test may no longer be 1 − β. Pampallona, Tsiatis and Kim (1995) have proposed the following strategy in order to match as closely as possible the desired power, 1 − β. Suppose we have completed look j at information fraction tj < 1 and have not yet crossed a stopping boundary. Let the next look be the last one and suppose that it will be taken at information fraction tL∗ , selected in such a way that the power of the test will be 1 − β. For a one-sided test allowing for early stopping to either reject H0 or H1 , we jointly solve ∗ the following equations for u∗L = lL and t∗L > tj , the latter being referred to as the optimal last look position: α(tj ) + P0 [l1 < W (t1 ) < u1 , · · · , lj < W (tj ) < uj , W (t∗L ) ≥ u∗L ] = α β(tj ) + Pη [l1 < W (t1 ) < u1 , · · · , lj < W (tj ) < uj , W (t∗L ) ≤ u∗L ] = β. For a one-sided test allowing for early stopping to only reject H0 the entire type-2 error can only be spent at the last look. In that case we jointly solve the following C.2 Post-Hoc Power and Preservation of Error – C.2.3 Optimal Last Look 2319 <<< Contents C * Index >>> Interim Monitoring in East 6 ∗ equations for u∗L = lL and t∗L > tj : α(tj ) + P0 [W (t1 ) < u1 , W (t2 ) < u2 , · · · , W (tj ) < uj , W (t∗L ) ≥ u∗L ] Pη [W (t∗L ) < u∗L ] = α = β These equations provide the information fraction, tL∗ , and the boundary value, uL∗ , for the next analysis, assuming it to be the last, such that the type-I error and the power are both preserved under the adopted schedule of analyses. Let tL be the actual information fraction at which the next and last analysis occurs. Then, the position of t∗L is optimal in the sense that tL < t∗L would entail a loss of power, while tL > t∗L would make the study unnecessarily overpowered, while only tL = t∗L would match the desired power exactly. East computes t∗L before and after every interim analysis, converts it into units of relevance to the current application (for example, total number of patients or total number of events) and displays this quantity in the box labeled “Ideal Next Look Position”. In the course of the study, this information can guide the investigator to position the next look optimally. It should be noted that t∗1 corresponds to the information fraction required for a study without interim monitoring (fixed sample size study) relative to the group sequential study under consideration. That is, given that no analyses have yet been performed, the optimum position of the last (and in this case also the first) look, would be the one corresponding to the fixed sample size. East displays this value when the Interim Monitoring module is entered for the first time. If the actual first analysis is performed at t1 < t∗1 and the stopping boundary has not been crossed then clearly t∗2 > t∗1 and the process continues. In this context it should be pointed out that since the error spending functions are defined only for tj ≤ 1 , any analysis performed at tj > 1 must necessarily be the last. East is capable of detecting this situation, and will accordingly compute the boundaries, spend the balance of the type I error, and display the post-hoc power. C.2.4 Post-Hoc Power Chart We have seen in Section C.2.3 that East is able to adjust the maximum information through the optimal last look methodology so as to satisfy the desired power and significance level of the design, despite departures from the chosen number of equally spaced analyses specified at the design stage. It may however be of interest to know what loss or gain in power would derive from the last analysis being performed at a 2320 C.2 Post-Hoc Power and Preservation of Error – C.2.4 Post-Hoc Power Chart <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 time point different from the suggested optimal last look position. The post-hoc power chart answers this question by providing a graph of the post-hoc power (on the Y-axis) versus the total information accumulated by the last look (on the X-axis). The point on the X-axis that matches the optimal last look position will correspond to full power on the Y-axis. Information is expressed in the post-hoc power chart in terms of units of relevance to the outcome being considered (e.g., patient accrual for normally distributed data or events for time to failure data). Towards the end of the study the post-hoc power chart tends to flatten out so that relatively small increases in power occur for relatively large increases in information. The post-hoc power chart is updated after each look and allows the user to decide whether the adjustment to the maximum information suggested by East is worth accepting, should the next look be the last. The chart is not displayed after a stopping boundary is crossed. C.3 Conditional Power at Ideal Next Look Position (East 5.4) The concept of conditional power at ideal next look position is borrowed from the setting of fixed sample size studies. It was first proposed in this setting by Lan and Wittes (1988). If the test statistic is computed when only part of the required total information has been collected, then the conditional power quantifies the probability of rejecting the null hypothesis should the total information be eventually available, conditional on the current information. Such a probability, when computed over a range of alternatives, can be of guidance in deciding whether to continue the study given the available evidence. In East this idea is extended to group sequential studies. Let us initially consider a one-sided group sequential test of size α, designed for early rejection of H0 . Suppose at the jth analysis the information fraction is tj and the test statistic, W (t), has value w(tj ). In the notation of Section C.2.3 let t∗L be the optimal placement of the next look, assuming it to be the last one, with corresponding boundary value u∗L . In East we define the conditional power at ideal next look position, CP at INLP, as the following probability : CP at INLP = Pη [W (t∗L ) ≥ u∗L | w(tj )] (C.10) Recall from Section B.1 that the statistic W (tj ) is defined as a sum of independent increments. This implies that the decomposition W (t∗L ) = W (tj ) + [W (t∗L ) − W (tj )] has the following three properties: 1. The random variables W (tj ) and W (t∗L ) − W (tj ) are normal and independent. 2. The means of these random variables are E[W (tj )] = ηtj and E[W (t∗L ) − W (tj )] = η(t∗L − tj ). C.3 Conditional Power at Ideal Next Look Position (East 5.4) 2321 <<< Contents C * Index >>> Interim Monitoring in East 6 3. The variances of these random variables are Var[W (tj )] = tj and Var[W (t∗L ) − W (tj )] = (t∗L − tj ). Once we have reached the information fraction tj we know that the random variable W (tj ) has assumed the value w(tj ). Therefore Pη [W (t∗L ) − w(tj ) ≥ u∗L − w(tj )] # " W (t∗L ) − w(tj ) − η(t∗L − tj ) u∗L − w(tj ) − η(t∗L − tj ) p∗ p∗ = Pη ≥ tL − tj tL − tj ! u∗L − w(tj ) − η(t∗L − tj ) p∗ = 1−Φ . (C.11) tL − tj CP at INLP = where Φ(x) is the cumulative distribution function for a standard normal random variable. For two-sided tests the conditional power is expressed as follows: CP at INLP = = Pη [|W (t∗L )| ≥ u∗L | w(tj )] u∗L − w(tj ) − η(t∗L − tj ) p∗ tL − tj ! −u∗L + w(tj ) − η(t∗L − tj ) p∗ tL − tj ! 1−Φ +Φ . (C.12) Analogous expressions can be derived for designs with boundaries for early rejection of either H0 or H1 . In East the conditional power is presented as a graph plotted against a wide range of alternatives for δ, including the one specified at the design stage. Now equations (C.11) and (C.12) express conditional power as a function of the drift parameter η rather than as a function of δ. At the design stage, the relationship between η and δ is captured by the equation p (C.13) η = (δ − δ0 ) Imax introduced in Section B.1 of Appendix B. Finally, it should be noted that, given the described approach, the conditional power curve computed before the very first look is the usual power curve. In particular, at that stage the optimal placement of the next and last look corresponds to a fixed sample size study so that under the alternative specified at design the conditional power is actually equivalent to the a priori unconditional power. 2322 C.3 Conditional Power at Ideal Next Look Position (East 5.4) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 C.4 Conditional and predictive power (East 6) The concept of conditional power is borrowed from the setting of fixed sample size studies. It was first proposed in this setting by Lan and Wittes (1988). If the test statistic is computed when only part of the required total information has been collected, then the conditional power quantifies the probability of rejecting the null hypothesis should the total information be eventually available, conditional on the current information. Such a probability, when computed over a range of alternatives, can be of guidance in deciding whether to continue the study given the available evidence. In East this idea is extended to group sequential studies. Suppose at the j th analysis the information fraction is tj and the test statistic, W (t), has value w(tj ).We define the conditional power at look j as the probability of attaining statistical significance in the direction of the alternative hypothesis at any future look, given w(tj ) Thus, if we are testing the null hypothesis that δ = δ0 against the alternative that δ > δ0 , the conditional power is defined as CPη (w(tj )) = P rη ∪K k=j+1 W (tk ) > uk |w(tj ) √ Here η = (δ − δ0 ) Imax is the trend parameter under the alternative hypothesis. If the alternative hypothesis is that δ < δ0 , then the conditional power is defined as CPη (w(tj )) = P rη ∪K k=j+1 W (tk ) < lk |w(tj ) Analogous expressions can be written for designs with boundaries for early rejection of either H0 or H1 and designs for early rejections of the two sided tests. The corresponding probabilities are obtained by the recoursive integration. The reference values of the conditional power are often based on the design or estimated value of the trend parameter η " # √ ηd = (δd − δ0 ) Imax w(t ) ηbj = tjj The predictive power P P (w(tj )) provides a weighted average of the conditional power values for a range of values of η Z P P (w(tj )) = CPη (w(tj )) f (η) dη C.4 Conditional and predictive power (East 6) 2323 <<< Contents C * Index >>> Interim Monitoring in East 6 We follow suggestion of Lan, Hu and Proschan (2009) and use the weighting function 1 f (η) = φ µ = ηbj , σ 2 = tj where φ denotes the probability density function of standard normal distribution. C.5 Repeated Confidence Intervals C.5.1 RCI’s Derived from Boundaries that Reject H0 C.5.2 RCI’s for Boundaries that Reject either H0 or H1 C.5.3 East Inputs In this section we discuss the computation of repeated confidence intervals (RCI’s), each interval being computed as part of an interim analysis. These RCI’s were first proposed by Jennison and Turnbull (1989) and are discussed in detail in Chapter 9 of their text book (Jennison and Turnbull, 2000). The naive confidence interval one would ordinarily compute from the data gathered at the end of a clinical trial is inappropriate if the confidence interval is computed repeatedly in a group sequential setting. In this setting the naive confidence interval will fail to provide the desired coverage for the parameter of interest due to the problem of multiplicity. In contrast the RCI’s provide simultaneous coverage for the parameter of interest at any desired confidence level despite the multiple looks at the data. C.5.1 RCI’s Derived from Boundaries that Reject H0 For ease of exposition let us consider RCI’s for two-sided group sequential trials of efficacy endpoints. The extention to one-sided efficacy or non-inferiority trials is straightforward. Let the primary parameter of interest be δ and suppose we perform K interim analyses at the information fractions t1 , t2 , . . . tK . At information fraction tj we compute the Wald statistic Z(tj ) = δ̂(tj ) se(δ̂(tj )) . (C.14) Recognizing that in large samples [se(δ̂(tj ))]−2 ≈ Ij , where Ij is the information about δ at time tj , we may also write the Wald statistic as p (C.15) Z(tj ) = δ̂(tj ) Ij . By the Scharfstein, Tsiatis p and Robins (1997) theorem introduced p in Section B.1 of Appendix B, Z(tj ) ∼ N (δ Ij , 1) and cov[Z(tj1 ), Z(tj2 )] = Ij1 /Ij2 . Let b1 , b2 , . . . bK be any two-sided level-α stopping boundaries for the Wald statistic for testing the null hypothesis that δ = 0. That is, P0 { K \ |Z(tj )| < bj } = 1 − α . j=1 2324 C.5 RCI – C.5.1 RCI’s Derived from Boundaries that Reject H0 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 p Now observe that (Z(tj ) − δ Ij ) ∼ N (0, 1) and has the same covariance structure as Z(tj ). Therefore K \ p Pδ { (C.16) |Z(tj ) − δ Ij | < bj } = 1 − α j=1 for any value of δ. Let H1 , H2 , . . . HK denote K two-sided RCI’s that maintain simultaneous coverage for δ at level 1 − α. Therefore we require these confidence intervals to satisfy the probability condition K \ Pδ { δ ∈ Hj } = 1 − α . (C.17) j=1 We can show that the sequence of intervals Hj = δ̂(tj ) ± se(δ̂(tj ))bj for j = 1, 2, . . . K, (C.18) satisfy the simultaneous coverage requirement (C.17). To prove this assertion observe that Pδ { K \ δ ∈ Hj } = Pδ { j=1 K \ δ̂(tj ) − se(δ̂(tj ))bj < δ < δ̂(tj ) + se(δ̂(tj ))bj } j=1 = Pδ { K \ p p p δ̂(tj ) Ij − bj < δ Ij < δ̂(tj ) Ij + bj } j=1 = Pδ { K \ p |δ̂(tj ) − δ| Ij < bj } j=1 = Pδ {|Z(tj ) − δ = p Ij | < bj } 1 − α (by equation (C.16)) . C.5.2 RCI’s for Boundaries that Reject either H0 or H1 Consider a K-look, level-α one sided group sequential test of the null hypothesis H0 : δ = 0 having 1 − β power to detect the alternative hypothesis H1 : δ = δ1 > 0. Suppose the interim monitoring takes place at the information fractions t1 , t2 , . . . tK . C.5 RCI – C.5.2 RCI’s for Boundaries that Reject either H0 or H1 2325 <<< Contents C * Index >>> Interim Monitoring in East 6 Let (lj , uj ), j = 1, 2, . . . K be the futility and efficacy boundaries, respectively, for this test. These boundaries have been derived in Section B.2.4 of Appendix B. Since these boundaries preserve the type-1 error we must have P0 { K \ Z(tj ) < uj } = 1 − α . (C.19) j=1 Therefore, following the argument made in the previous section, Pδ { K \ Z(tj ) − δ p Ij < uj } = 1 − α . (C.20) j=1 p Now the event Z(tj ) − δ Ij < uj if and only if δ > δ̂(tj ) − uj se(δ̂(tj )). Thus sequence {δ̂(tj ) − uj se(δ̂(tj )): j = 1, 2, . . . K} simultaneously excludes δ from below with probability 1 − α. It follows that the probability that one or more of these lower confidence bounds fails to cover δ from below is at most α. Next consider the behaviour of the Wald statistics under H1 : δ = δ1 . Since the lower stopping boundaries were constructed from a β spending function we must have Pδ1 { K \ Z(tj ) > lj } = 1 − β . (C.21) j=1 Therefore by centralizing the Wald statistic we get p p Pδ {Z(tj ) − δ Ij + δ1 Ij > lj } = 1 − β . (C.22) From this we can easily show that the sequence {δ̂(tj ) − δ1 − lj se(δ̂(tj )): j = 1, 2, . . . K} simultaneously excludes δ from above with probability 1 − β. It follows that the probability that one or more of these upper confidence bounds fails to cover δ from above is at most β. Thus the sequence of intervals {[δ̂(tj ) − uj se(δ̂(tj )), δ̂(tj ) + δ1 − lj se(δ̂(tj ))]: j = 1, 2, . . . K} (C.23) simultaneously contains the true value of δ with probability 1 − α − β. C.5.3 Inputs to East for RCI Computation Equation (C.18) shows that in order to compute the RCI’s East needs to know both the numerator (δ̂(tj )) and denominator (se(δ̂(tj ))) of the Wald statistic (C.14). East provides a Test Statistic Calculator for entering these two components separately into the appropriate cell of the interim monitoring worksheet. For example, in the case of interim monitoring of a normal design study, if you click the button 2326 C.5 RCI – C.5.3 East Inputs <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Enter Interim Data on the IM dashboard, the following dialog box appears into which you may enter the observed values for δ̂(tj ) and (se(δ̂(tj ))). Sometimes, however, the separate components of the Wald statistic may not be known. Then the user has no choice but to directly enter the observed value of the Wald statistic Z(tj ) into the Interim Monitoring worksheet. In such cases, East suppresses the output of repeated confidence intervals, conditional power estimates and the final adjusted inference estimates from the interim monitoring worksheet. C.5 RCI – C.6.3 East Inputs 2327 <<< Contents C C.6 * Index >>> Interim Monitoring in East 6 Inference Following Group Sequential Testing C.6.1 Stage-Wise Ordering C.6.2 Adjusted P-values C.6.3 Adjusted Confidence Interval C.6.4 Point Estimation C.6.5 Acceptance Boundaries C.6.6 Drift Parameter and Effect Size In this section we discuss the computation of p-values and confidence intervals for the parameter of interest at the end of a group sequential clinical trial. The naive approach of computing these quantities in the usual way, ignoring the fact that a sequential monitoring procedure was used to possibly stop early, will fail to preserve the desired type-I error of the significance test or the desired coverage of the confidence interval. Rather, one must first order the sample space to reflect the sequential nature of the test procedure, and then obtain p-values and confidence intervals on the basis of this ordering. Jennison and Turnbull (2000, Chapter 8) discuss four ways to order the sample space of a group sequential experiment and thereby perform an adjusted inference in which the type-I error and the coverage are both preserved. These four ways are, stage-wise ordering, MLE ordering, likelihood ratio ordering, and score test ordering. In East we adopt stage-wise ordering of the sample space. This ordering was first proposed by Armitage (1957) and later used by Fairbanks and Madsen (1982), Tsiatis, Rosner and Mehta (1984), and Kim and DeMets (1987). Of all the four orderings this is the one most favored by Jennison and Turnbull (2000) because it does not require knowledge about the interim monitoring time-points that would have been adopted in the future, had the study not stopped early. The other three orderings of the sample space do require this knowledge and are therefore limited in their practical applicability. In addition, stage-wise ordering ensures consistency between the p-value and the confidence interval. That is, a 100 × (1 − α)% confidence interval will exclude the parameter value under the null hypothesis if and only if the corresponding p-value does not exceed α. Finally, the p-value based on stage-wise ordering is less than the significance level, α, if and only if H0 is rejected. C.6.1 Stage-Wise Ordering of the Sample Space Suppose that the sequentially computed random variable W (t) ∼ N (ηt, t) crosses a stopping boundary for the first time at the jth look in a group sequential clinical trial where the current information fraction is tj and the current value of the test statistic is w∗ (tj ). Let the information fractions at the earlier looks be {t1 , t2 , . . . tj−1 } with corresponding lower and upper stopping boundaries given by (li , ui ), i = 1, 2, . . . j − 1. Define the ith continuation region as Ci = (li , ui ). The li ’s might each be −∞, in which case we have a one-sided sequential test with early stopping to reject H0 . On the other hand, if li = −ui for all i, we have a two-sided sequential test with early stopping to reject H0 . More generally the (li , ui ) pairs could represent the lower and upper stopping boundaries, respectively, of the Pampallona and Tsiatis (1994) family for a one-sided sequential test with early stopping to reject either H0 or H1 . The most complex case, inner-wedge stopping boundaries to reject either H0 or H1 with a two-sided test, is not covered in this section but is discussed in Section C.6.5. 2328 C.6 Adjusted Inference – C.6.1 Stage-Wise Ordering <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The sample space of a sequential experiment which was terminated at the jth interim look with an observed value of w∗ (tj ) for the test statistic, consists of the union over all i = 1, 2, . . . j of all possible trajectories that terminate at the ith look. These trajectories are of the form (t1 , w(t1 )) → (t2 , w(t2 )) → · · · → (ti , w(ti )) where w(ti ) ∈ / Ci but w(tg ) ∈ Cg , for all g = 1, 2, . . . i − 1. The idea behind stage-wise ordering of this sample space is to associate earlier stopping with larger values of η. Accordingly, in stage-wise ordering, the ordered pair (ta , w(ta )) is more extreme than the ordered pair (tb , w(tb )) whenever any one of the following four conditions holds: (i) (ii) (iii) (iv) w(ta ) ≥ ua and w(tb ) ≤ lb for ta , tb = 1, 2, . . . j − 1, w(ta ) > w(tb ) if ta = tb for ta , tb = 1, 2, . . . j, ta < tb if w(ta ) ≥ ua and w(tb ) ≥ ub for ta , tb = 1, 2, . . . j − 1, ta > tb if w(ta ) ≤ la and w(tb ) ≤ lb for ta , tb = 1, 2, . . . j − 1. Figure C.1 is a visual display of the stage-wise ordering of the sample space for a study with three interim looks. For additional discussion of stage-wise ordering refer to Jennison and Turnbull (2000, page 179). C.6.2 Adjusted P-values A p-value is defined as the probability, under the null hypothesis, of obtaining an outcome at least as extreme as the one actually observed. The set of points which are at least as extreme as the observed point, (tj , w∗ (tj )), can be identified by applying the stage-wise ordering scheme to each sample point in accordance with the rules set forth in Section C.6.1. Denote this set by E ∗ . Then the p-value, adjusted for the sequential testing, is the probability under the null hypothesis of obtaining the event E ∗ . That is, p∗ = P0 {E ∗ } . C.6.3 (C.24) Adjusted Confidence Interval The method applied in East for deriving a confidence interval for η follows the approach proposed by Tsiatis, Rosner and Mehta (1984) and later extended by Kim and DeMets (1987). The basic idea is to search for the upper and lower confidence bounds of η such that the p-value under the alternative hypothesis just becomes statistically significant. Suppose the study was terminated at the observed point (tj , w∗ (tj )) and let E ∗ be the set of points at least as extreme as (tj , w∗ (tj )) in accordance with the stage-wise ordering scheme developed in Section C.6.1. Then the C.6 Adjusted Inference – C.6.3 Adjusted Confidence Interval 2329 <<< Contents C * Index >>> Interim Monitoring in East 6 Figure C.1: Example of the ordering of the sample space {(ti , W (ti) ); i = 1, 2, 3}. (Arrows point from more extreme to less extreme points, where extreme refers to evidence of larger values of the effect size, η.) W(t) ? u2 u1 ? ? ? t2 ? t3 ? ? t1 l1 t ? I @ @ @ l2 @ @ @ ? ? ? @ I @ ? 100 × (1 − 2ν) confidence interval for η is (η L , η U ) where ηU η C.6.4 L = sup {η : Pη {E ∗ } ≤ 1 − ν} , (C.25) = ∗ (C.26) inf {η : Pη {E } ≥ ν} . Point Estimation Kim (1989) has proposed the following median unbiased estimator (MUE) for the parameter η. The MUE, denoted by η̃ is the value of η that satisfies Pη̃ {E ∗ } = 0.5 . C.6.5 Boundaries to Accept H0 The adjusted p-values, confidence intervals and point estimations discussed in 2330 C.6 Adjusted Inference – C.6.5 Acceptance Boundaries (C.27) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sections C.6.2, C.6.3 and C.6.4, respectively, can be extended to the one-sided and two-sided H0 − −H1 stopping boundaries. We must be careful, however, to exclude from the set E ∗ all points that that lie within the region where the null hypothesis is accepted. This approach will produce adjusted p-values and confidence intervals with the correct properties so long as the study is not terminated by the test statistic entering the acceptance region. In the latter case, since the null hypothesis is accepted, East will not report a p-value or confidence interval. C.6.6 Drift Parameter and Effect Size In Sections C.6.3 and C.6.4 we showed how East computes adjusted confidence intervals and median unbiased point estimates, respectively, for the drift parameter η. These estimates must be transformed into corresponding estimates of the effect size δ in order to be meaningful to the end user. The relationship between η and δ was shown in Section B.1 of Appendix B to be p η = (δ − δ0 ) Imax . (C.28) Thus if we know the value of Imax we can solve the above equation for δ in terms of η. For example, δU δL ηU = δ0 + √ Imax ηL = δ0 + √ . Imax For each specific application (e.g. normal, binomial or time to failure data) we have derived, in Section B.1 of Appendix B, an expression for Imax in terms of nmax and other parameters specified at the design stage. These relationships are used to transform the confidence interval for η into a corresponding confidence interval for δ. However, these relationships usually contain nuisance parameters that must be estimated from the current data. For example, we would use equation (B.15) to compute Imax for the normal case and would therefore need to estimate σ 2 from the data. We would use equation (B.20) to compute Imax for binomial superiority trials and equation (B.23) to compute Imax for binomial non-inferiority trials. In either case we would need to estimate πc , the control response rate, from the current data. In East we use the following unified method to evaluate the maximum information, Imax . Suppose we have just completed the jth interim analysis. Let Ij denote the current information and tj = Ij /Imax denote the current information fraction. Then we can re-write Imax as p −2 . (C.29) Imax = t−1 j Ij = [ tj se(δ̂j )] C.6 Adjusted Inference – C.6.6 Drift Parameter and Effect Size 2331 <<< Contents C * Index >>> Interim Monitoring in East 6 Thus as long as we provide East with the current standard error estimate, se(δ̂j ), East can estimate Imax from equation (C.29). The value of se(δ̂j ) is passed to East through the test statistic calculator. If this calculator is by-passed in favor of entering the current value of the test statistic directly into the interim monitoring worksheet, East will not produce adjusted p-values, point estimates or confidence intervals upon study termination. C.7 Monitoring Data from any General Distribution The interim monitoring of studies that are designed with the General Design module in East is no different than the procedure used for studies designed by the Normal, Binomial or Survival Design modules. The user supplies East with the maximum information, I1 needed for a single look study that is designed to investigate some parameter of interest, say δ. Usually I1 will be translated into a sample size N1 before it is input to the General Design worksheet of East. Sometimes I1 will be expressed in terms of the number of events needed for a single look study. The Poisson example in Chapter 60 is one such case. It is also permissible to retain I1 in terms of Fisher information for a single look study, and to approximate it by [se(δ̂(τ ))]−2 . This case is, however, better handled by the I module, discussed in Section C.8. Once the value of I1 is provided to East, it is inflated to IK by the appropriate K-look inflation factor, as discussed in Section B.3 of Appendix B. Thereafter the inflated information is utilized to determine the information fraction at each interim look in the interim monitoring phase of the study and the entire machinery of flexible monitoring with error spending functions is made available to the study. The study stops when the Wald statistic, given by equation (B.1) crosses a stopping boundary. This approach is very useful in all those situations that are not currently covered by a specialized module within East. For instance, from any commercial sample size package one might obtain the fixed sample size requirements for the comparison of survival of two groups by means of a stratified log-rank test (expressed in terms of a fixed number of events) or for the comparison of two groups in terms of repeated measures (expressed in terms of a fixed number of subjects). East can then compute the corresponding information needed for group-sequential monitoring. C.8 2332 Information Based Monitoring Suppose that the observations are generated from some probability model and a single parameter, δ, from this model characterizes the relationship under investigation while the remainder of the model is characterized by nuisance parameters. Interest focuses on developing a sequential procedure, with possible early stopping, for testing the null hypothesis H0 : δ = δ0 against the alternative H1 : δ = δ1 . Suppose the study has been designed for a total of K interim looks. Then the maximum information, C.8 Information Based Monitoring <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 IK ≡ Imax , to be committed up-front is given by equation (B.83) as IK = zα + zβ δ1 − δ0 2 × IF(α, β, K, boundaries) . (C.30) This information is approximated by [se(δ̂(τK ))]−2 , where τK is the calendar time at the last look. The information at any intermediate look, taken at calendar time τj , is likewise approximated by [se(δ̂(τj ))]−2 . Suppose that the interim monitoring takes place at calendar times τ1 , τ2 , . . . τK . Then the sequential monitoring procedure at calendar time τj requires us to compute the information fraction tj = [se(δ̂(τj ))]−2 [se(δ̂(τK ))]−2 , read off the values α(tj ) and β(tj ) from the appropriate error spending functions, and re-compute the stopping boundaries based on these values, in the manner described in Section C.1 of this appendix. The study is terminated if the Wald statistic δ̂(tj ) − δ0 Z(tj ) = q var[δ̂(τj )] crosses a stopping boundary. The big advantage of monitoring on the above information scale is that the total information, IK , required in order for the study to achieve the desired 1 − β power, only depends on δ1 − δ0 , the specific parameters of interest under H0 and H1 . No nuisance parameters are involved in the computation of maximum information. In contrast, if we were to monitor the study on the scale of a physical resource like sample size or number of events, the maximum information would depend on one or more nuisance parameters. If those nuisance parameters were guessed incorrectly, the study would not have the power it was intended to have at the design phase. This will become much clearer as you work through the example of sample size re-estimation provided in Chapter 59. C.8 Information Based Monitoring 2333 <<< Contents * Index >>> D D.1 Computing the Expected Number of Events General expressions We consider a single arm of a survival study and derive an expression for the expected number of events d(l) to be observed at the calendar time l. A delay between the calendar time when a subject experiences an event or drops out of a study and the calendar time when this information becomes available to an investigator is assumed to be negligible. Our equations may be viewed as a slight generalization of the expressions presented in Kim and Tsiatis (1990). Figure D.1: Geometry of a problem We are interested in a following general setting: A subject is followed no longer than a maximum period of time m. An observation of the event of interest or the subject’s drop out from the study terminates a follow-up process. An accrual rate a(u), 0 ≤ u ≤ Sa is not uniform. The event hazard rate λ(t) and the drop-out hazard rate γ(t) depend on the subject’s follow-up time t = l − u. An important special case arises when a limitation on the maximum follow-up time is removed. It corresponds to m = ∞. The accrual rate is often considered known at the time of the design of a study. It may also be calculated based on the known total number of subjects in the study and the known proportion of subjects recruited during the interval (u, u + du). 2334 D.1 General expressions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Figure D.1 illustrates a geometry of the problem. The horizontal axis denotes a calendar time. An accrual period ends at Sa . The follow-up of a subject accrued at l = Sa is completed no later than l = Sa + m. Each of the two horizontal lines connects the beginning of an accrual period with a calendar time l which may be positioned within (the lower line) or after (the upper line) an accrual period. At a calendar time l the subjects with the accrual time 0 ≤ u ≤ l − m (group A) are no longer followed because their follow-up windows are closed. Subjects who were accrued later (group B) are continued to be observed unless their follow-up was terminated by the event of interest or a drop out. The value of interest d(l) may be presented as a sum of contributions d(l) = dA (l) + dB (l) (D.1) from these groups. We note that in the absence of the restriction on a follow-up time (m = ∞) the group A does not exist and the corresponding contribution in (D.1) disappears. Let us denote by v a time from randomization to an event of interest and by w a time from randomization to the time of subject’s drop-out from the study. We assume that random variables v and w are independent and express their probability density functions f (v) and g(w) through the event λ(t) and drop-out γ(t) hazard functions Z v f (v) = λ(v)e−Λ(v) , Λ(v) = λ(t)dt 0 g(w) = γ(w)e−H(w) , H(w) = Z w γ(t)dt, 0 Let us denote by Ψ(t) = P (v ≤ t, w > v) a probability that event occurred before the follow-up t and was not censored. We note that Z t Z ∞ Z t Ψ(t) = g(w)dw f (v)dv = κ(t0 )dt0 0 v 0 with κ(t) = λ(t) e−[Λ(t)+H(t)] D.1 General expressions 2335 <<< Contents * Index >>> D Computing the Expected Number of Events Figure D.2: Geometry of integration Figure D.2 (a) illustrates a geometry of the integration. A shaded area marks the area of integration in the (v, u) plane. In a calculation of d(l) we make a distinction between the following cases Table 1 Special cases. Case 1 2 3 4 5 l and l∗ 0 ≤ l ≤ Sa l∗ < 0 0 ≤ l ≤ Sa ∗ 0 ≤ l ≤ Sa − m Sa < l ≤ Sa + m l∗ < 0 Sa < l ≤ Sa + m 0 ≤ l∗ ≤ Sa Sa + m < l Sa < l∗ dA (l) Ψ(m) Ψ(m) Ψ(m) R l∗ 0 R l∗ 0 R Sa 0 dB (l) 0 Rl a(u)du Rl a(u)Ψ(l − u)du 0 R Sa a(u)Ψ(l − u)du a(u)du R Sa a(u)du 0 l∗ 0 l∗ a(u)Ψ(l − u)du a(u)Ψ(l − u)du 0 We approximate a(u) by a piece-wise constant function splitting an accrual interval 2336 D.1 General expressions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 into i = 1, . . . , na subintervals with the boundaries [ui−1 , ui ) and denoting a constant accrual rate within an interval i by ai . If the calendar time l∗ is located within the interval i∗ then ∗ Z l∗ iX −1 ai (ui − ui−1 ) + ai∗ (l∗ − ui∗ −1 ) a(u)du = 0 i=1 ∗ If l = Sa = una then we get Z Sa a(u)du = 0 na X ai (ui − ui−1 ) i=1 Ru An integral u12 a(u)Ψ(l − u)du where u1 belongs to an interval i1 and u2 belongs to an interval i2 may be written as Z u2 iX 2 −1 a(u)Ψ(l − u)du = ai1 ϕ(u1 , ui1 , l) + ai ϕ(ui−1 , ui , l) + u1 i=i1 +1 +ai2 ϕ(ui2 −1 , u2 , l) with Z umax Z umax Ψ(l − u)du = ϕ(umin , umax , l) = umin Z du umin l−u κ(t)dt 0 and l ≥ umax . An integration region of the two-dimensional integral is shown as a shaded area on Figure D.2 (b). A more convenient expression for ϕ(umin , umax , l) is obtained by changing the order of integration Z l−umax Z umax Z l−umin Z l−t ϕ(umin , umax , l) = dt κ(t)du + dt κ(t)du 0 umin Z = l−umax umin l−umax (umax − umin ) κ(t)dt 0 Z l−umin (l − umin − t)κ(t)dt + (D.2) l−umax The integrals can be calculated numerically for an arbitrary hazard functions λ(t) and γ(t). In a special case of piece-wise constant hazard functions a calculation of the Rb Rb integrals a κ(t)dt and a κ(t)tdt in equation ( D.2) is simplified. An integral over the interval (a, b) is presented as a sum of the integrals over the intervals [tj−1 , tj ), j = 1, . . . , J where both hazard and drop out rates λj and γj are constant. These integrals are calculated analytically Z b J X κ(t)dt = I0j (D.3) a D.1 General expressions j=1 2337 <<< Contents * Index >>> D Computing the Expected Number of Events where Z I0j tj = κ(t)dt = λj e −[Λ(tj−1 )+H(tj−1 )] tj Z tj−1 e−λs,j (t−tj−1 ) dt tj−1 h i = cj 1 − e−λs,j (tj −tj−1 ) with λs,j = λj + γj and λj −[Λ(tj−1 )+H(tj−1 )] e λs,j cj = (D.4) Similarly Z b κ(t) tdt = a J X I1j (D.5) j=1 where Z I1j tj = κ(t) tdt = λj e −[Λ(tj−1 )+H(tj−1 )] Z tj−1 = cj tj eλs,j (t−tj−1 ) tdt tj−1 1 tj−1 + λs,j −e −λs,j (tj −tj−1 ) 1 tj + λs,j (D.6) In the following sections we present simplified versions of these general expressions that correspond to the more restrictive settings. D.2 Fixed hazard rate, uniform accrual D.2.1 General setting D.2.2 No drop out and no fixed follow-up D.2.3 Drop out and no fixed follow-up. D.2.4 No drop out and fixed follow-up. D.2.1 General setting Consider a situation where the event hazard rate is constant (λ(t) = λ), the accrual rate is uniform (a(t) = a), and there are the drop outs hazard rate is constant (γ(t) = γ). A subject is followed up to a maximum of m < ∞ units of time if an event of interest or drop out does not occur first. The following derivation gives the formula for the expected number of events at calendar time l for all of the cases listed in Table 1. 0 ≤ l ≤ Sa , l∗ < 0: "Z d(l) = a ϕ(0, l, l) = a 0 2338 l # " Z # Z l l (l − t)κ(t)dt = a l κ(t)dt − κ(t)tdt 0 D.2 Fixed hazard rate, uniform accrual – D.2.1 General setting 0 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 An application of expressions (D.4) and (D.6) leads to the following results Z l i λ h 1 − e−(λ+γ)l λ+γ κ(t)dt = 0 Z 0 l λ 1 1 −(λ+γ)l κ(t)tdt = −e l+ λ+γ λ+γ λ+γ Therefore aλ d(l) = λ+γ 1 l− λ+γ e−(λ+γ)l + λ+γ (D.7) 0 ≤ l ≤ Sa , 0 ≤ l∗ ≤ Sa − m: Z m a λ (l − m) dA (l) = a(l − m) κ(t)dt = 1 − e−(λ+γ)m λ+γ 0 dB (l) Z ∗ l−l∗ = a ϕ(l , l, l) = a (l − l∗ − t)κ(t)dt = 0 Z m Z m = a m κ(t)dt − κ(t)tdt 0 0 An application of equations (D.4) – (D.6) leads to the following expressions Z m λ κ(t)dt = 1 − e−(λ+γ)m λ+γ 0 Z 0 m λ 1 1 −(λ+γ)m κ(t)tdt = −e m+ λ+γ λ+γ λ+γ Therefore aλ dB (l) = λ+γ 1 m− λ+γ e−(λ+γ)m + λ+γ The resulting expression is aλ 1 1 d(l) = l− − e−(λ+γ)m l − m − λ+γ λ+γ λ+γ D.2 Fixed hazard rate, uniform accrual – D.2.1 General setting (D.8) 2339 <<< Contents * Index >>> D Computing the Expected Number of Events Sa < l ≤ Sa + m, l∗ < 0 : " d(l) = l−Sa Z a ϕ(0, Sa , l) = a Sa Z = l−Sa Z a Sa (l − t)κ(t)dt κ(t)dt + 0 " # l l−Sa Z l # l Z κ(t)dt − κ(t)dt + l 0 l−Sa κ(t)tdt l−Sa An application of equations (D.4) – (D.6) leads to the following expressions l−Sa Z κ(t)dt = 0 Z l κ(t)dt = l−Sa Z l κ(t)tdt λ 1 − e−(λ+γ)(l−Sa ) λ+γ λ e−(λ+γ)(l−Sa ) 1 − e−(λ+γ)Sa λ+γ λ e−(λ+γ)(l−Sa ) λ+γ 1 1 l − Sa + − e−(λ+γ)Sa l + λ+γ λ+γ = l−Sa and the resulting expression for d(l) has the following form e−(λ+γ)l (λ+γ)Sa aλ Sa − e −1 λ+γ λ+γ d(l) = (D.9) Sa < l ≤ Sa + m , 0 ≤ l∗ ≤ Sa : l∗ Z dA (l) = Ψ(m) 0 λ −(λ+γ)m a(u)du = 1−e a(l − m) λ+γ " dB (l) ∗ ∗ Z l−Sa = aϕ(l , Sa , l) = a (Sa − l ) Z ∗ Z = a (Sa − l ) (l − l − t)κ(t)dt l−Sa Z m Z m κ(t)dt − κ(t)dt + m 0 2340 l−Sa # ∗ κ(t)dt + 0 " l−l∗ l−Sa # κ(t)dt l−Sa D.2 Fixed hazard rate, uniform accrual – D.2.1 General setting <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 An application of equations (D.4) – (D.6) leads to the following expressions Z l−Sa λ κ(t)dt = 1 − e−(λ+γ)(l−Sa ) λ+γ 0 Z m κ(t)dt = l−Sa Z m κ(t)tdt l−Sa = λ −(λ+γ)(l−Sa ) e 1 − e−(λ+γ)(m+Sa −l) λ+γ λ −(λ+γ)(l−Sa ) e λ+γ 1 1 − e−(λ+γ)(m+Sa −l) m + l − Sa + λ+γ λ+γ Therefore dB (l) = aλ e−(λ+γ)l (λ+γ)Sa (Sa + m − l) − e − e(λ+γ)(l−m) λ+γ λ+γ and d(l) = e−(λ+γ)l (λ+γ)Sa aλ e − e(λ+γ)(l−m) Sa − (l − m)e−(λ+γ)m − λ+γ λ+γ (D.10) Sa + m < l , Sa < l∗ : Events that occur at the calendar time l exceeding Sa + m are not observed because the maximum follow-up time m is limited. The expression for d(l) has the following form d(l) = a Sa Ψ(m) = D.2.2 a Sa λ 1 − e−(λ+γ)m λ+γ (D.11) No drop out and no fixed follow-up In this situation the event hazard rate is constant (λ(t) = λ), the accrual rate is uniform (a(u) = a), there are no drop outs (γ(t) = 0), and subjects are followed up until the end of study (m = ∞). In the unlimited follow-up time setting l∗ = l − m is always negative and only the cases 1 and 3 from the Table 1 are to be considered. The D.2 Fixed hazard rate, uniform accrual – D.2.2 No drop out and no fixed follow-up2341 <<< Contents * Index >>> D Computing the Expected Number of Events following expressions for d(l) are obtained from equations ( D.7, D.9) by a substitution of γ = 0. 0 < l ≤ Sa , l∗ < 0: d(l) = a 1 e−λl l− + λ λ Sa < l, l∗ < 0: e−λl λSa d(l) = a Sa − e −1 λ D.2.3 Drop out and no fixed follow-up. In this situation we consider the event hazard rate is constant (λ(t) = λ), the accrual rate is uniform (a = a), the drop out hazard rate γ(t) = γ is non-zero, and subjects are followed up until the end of study (m = ∞). Once again, the cases 1 and 3 from Table 1 are to be considered and the expressions ( D.7, D.9) are directly applicable. 0 < l ≤ Sa , l∗ < 0 : d(l) = aλ λ+γ l− 1 λ+γ + e−(λ+γ)l λ+γ Sa < l, l∗ < 0 : aλ e−(λ+γ)l (λ+γ)Sa d(l) = Sa − e −1 λ+γ λ+γ D.2.4 No drop out and fixed follow-up. Now consider a situation where the event hazard rate is constant (λ(t) = λ), the accrual rate is uniform (a(t) = a), and there are no drop outs (γ(t) = 0). However, each subject is now followed up to a maximum of m < ∞ units of time if an event of 2342 D.2 Fixed hazard rate, uniform accrual – D.2.4 No drop out and fixed follow-up. <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 interest or drop out does not occur first. The following expressions are obtained from equations ( D.7–D.11) by a substitution of γ = 0. 0 ≤ l ≤ Sa , l∗ < 0: d(l) = a 1 l− λ e−λl + λ 0 ≤ l ≤ Sa , 0 ≤ l∗ ≤ Sa − m: 1 1 −λm −e l−m− d(l) = a l − λ λ Sa < l ≤ Sa + m, l∗ < 0 : e−λl λSa e −1 d(l) = a Sa − λ Sa < l ≤ Sa + m , 0 ≤ l∗ ≤ Sa : e−λl λSa d(l) = a Sa − (l − m)e−λm − e − eλ(l−m) λ Sa + m < l , Sa < l∗ : d(l) = a Sa 1 − e−λm D.3 Piecewise constant hazard and drop out rates, no follow-up limit Consider a setting where an accrual is uniform (a(u) = a) and hazard and drop-out rates are piece-wise constant so that λ(t) = λk and γ(t) = γk for [τk−1 ≤ t < τk )). We also assume that there is no follow-up limit (m = ∞). For the unlimited follow-up time m the value l∗ = l − m is always negative and therefore only the cases 1 and 3 from the Table 1 are to be considered. 0 < l ≤ Sa , l∗ < 0: Z l Z (l − t)κ(t)dt = l d(l) = a ϕ(0, l, l) = 0 l Z κ(t)tdt − 0 l κ(t)tdt 0 We denote by k ∗ the number of the interval [τk∗ −1 , τk∗ ) which contains l. The Rl Rl integrals 0 κ(t)dt and 0 κ(t)tdt are calculated using the expressions ( D.3) – (D.6) with a = 0, b = l, J = k ∗ , tj = τj for j = 0, . . . , J − 1 and tJ = l. D.3 Piecewise constant hazard and drop out rates, no follow-up limit 2343 <<< Contents * Index >>> D Computing the Expected Number of Events Sa < l, l∗ < 0: l−Sa Z d(l) Z 0 l−Sa Z Z = Sa (l − t)κ(t)dt l−Sa Z l l κ(t)dt − κ(t)dt + l 0 l κ(t)dt + = a ϕ(0, Sa , l) = Sa l−Sa κ(t)tdt l−Sa We denote by k ∗ the number of the interval [τk∗ −1 , τk∗ ) which contains Sa and by k 0 the number of the interval τk0 −1 , τk0 which contains l − Sa . The calculation of the RS Rl Rl integrals 0 a κ(t), l−Sa κ(t)dt and l−Sa κ(t)tdt is based on the expressions ( D.3) – (D.6). In the calculation of an integral over the interval (0, Sa ) we use a = 0, b = Sa , J = k ∗ , tj = τj for j = 0, . . . , J − 1 and tJ = Sa . The corresponding values used in the calculation of the integrals over the interval (l − Sa , l) are a = l − Sa , b = l, J = k ∗ − k 0 + 1, t0 = l − Sa , tj = τk0 −1+j , j = 1, . . . , J − 1 and tJ = l. D.4 Non-uniform accrual, constant hazard and drop out rates If the setting where an accrual a(u) is not uniform but hazard and drop-out rates are constant the following simplified expressions for the integrals in the expression D.2 are available Z l−umax κ(t)dt = 0 Z l−umin κ(t)dt = l−umax Z l−umin κ(t)tdt = l−umax i λ h 1 − e−λs (l−umax ) λs i λ −λs (l−umax ) h e 1 − e−λs (umax −umin ) λs λ −λs (l−umax ) e λs 1 1 l − umax + − e−λs (umax −umin ) l − umin + λs λs and λs = λ + γ. 2344 D.4 Non-uniform accrual, constant hazard and drop out rates <<< Contents * Index >>> E Generating Survival Simulations in EastSurv East provides the user with powerful simulation tools for trials with time-to-event endpoints. In addition to easily verifying the operating characteristics of the many different design scenarios mentioned in Appendix B, the simulations may be used to actually design for non-standard problems where power and sample size calculations are analytically intractable. For instance, East allows the user to simulate trials in which the hazard rates for each treatment arm are non-proportional. By trial and error, running simulations under various parameter choices, the user may find an appropriate design for this kind of trial. East actually provides two simulation methods: a) Basic simulation and b) Enhanced simulation. The Basic simulation method uses asymptotic theory when generating the data and is discussed in the main East manual. In East 3.1, the enhanced simulation also used asymptotic theory to generate the data, but allowed the user to change some of the design parameter values in order to simulate under various scenarios. EastSurv’s enhanced simulation tool no longer generates the data using asymptotic theory. The purpose of this appendix in fact is to outline how the data are generated in the new enhanced survival simulations. When initiating an enhanced survival simulation session, East uses as input all the parameters selected during the design stage. By clicking on the ”Show Survival Parameters” button, a survival sheet is opened that allows the user to change these parameter values. In fact, the flexibility offered to the user in this screen is such that the piecewise exponential hazard curves in each treatment arm can be individually specified. This permits the user to specify late separating hazard curves or even crossing hazard curves. In addition, the user must also decide how each simulated trial will terminate by choosing whether to a) fix the number of events in the trial or b) fix the study duration. Once this is done, clicking either the “Run” button or “Single Step” button starts the simulations. East then proceeds as follows. In each simulation: 1. For each accrual period (a) East computes the number of subjects to be accrued in the control group and the treatment group. (b) For each subject i i. A random accrual time tacc,i of subject i is generated as a random value from the uniform distribution bounded by the starting and ending times of the current accrual period 2345 <<< Contents E * Index >>> Generating Survival Simulations in EastSurv ii. A random survival time tsurv,i is generated as a random value from the survival time distribution characterized by a piecewise hazard rate iii. A random dropout time tdrop,i is generated as a random value from the exponential distribution characterized by the dropout rate. iv. An indicator of censoring Ci is computed as Ci = 0 if tsurv,i ≥ tdrop,i and tsurv,i ≥ tf ix Ci = 1 otherwise where tf ix is the user-specified fixed maximum follow-up time of a subject. 2. Now for each look j (a) If the timing of the look j is characterized by the time Sj since the initiation of the study then the value of Sj is predefined. (b) If the timing of the look j is driven by the number of events so that the look j occurs immediately after observing Nj events then Sj is calculated based on the study times tstudy,i = tacc,i + tsurv,i of the uncensored observations (with Ci = 0). (c) At the time Sj the subset of observations of interest is limited to the observations from accrued subjects (tacc,i <= Sj ). For the look-based analysis the observations with tstudy,i > Sj are treated as censored. (d) The calculated values of Sj or Nj are stored for the subsequent calculation of average values across the simulations. (e) East computes the test statistic and checks if a stopping boundary has been crossed. i. If yes, or if the last look has been reached without crossing a stopping boundary, it proceeds to the next simulation. ii. Otherwise, it proceeds to the next look. 2346 <<< Contents * Index >>> F Spending Functions Derived from Power Boundaries East provides several families of published spending functions, each with a well defined functional form. These spending functions are all documented in Section B.2.4 of Appendix B. The general approach is to select one of these published spending functions for generating the stopping boundaries at the design stage and to select the same spending function to re-compute the stopping boundaries at the interim monitoring stage. This gives us the flexibility to change the number and spacing of the interim looks during the interim monitoring stage. However, the Wang-Tsiatis (1987) and Pampallona-Tsiatis (1994) power boundaries are not derived from spending functions. If these boundaries is used for the study design they should also be used for interim monitoring. This could be problematic if the number and spacing of the interim looks changes from what was specified at the design stage. For this reason we construct special “ten-look” spending functions that correspond to the members of the Wang-Tsiatis or Pampallona-Tsiatis family. The next section shows how this is accomplished. F.1 Inverting Ten-Look Power Boundaries For each Wang-Tsiatis power boundary of the form C(∆, α, K)t∆ j , j = 1, 2, . . . K, we compute the type-1 errors, as they accumulate at each of the equally spaced looks, t1 , t2 , . . . tK , according to the selected values of ∆ and α, but with a preset value for the maximum number of looks, K = 10. For example, suppose we wish to generate a spending function that corresponds to a one-sided Wang- Tsiatis power boundary for a specific value of α and ∆. The first step is to compute the actual boundary values at the ten equally spaced looks t1 , t2 , . . . t10 , where tj = j/10, using the procedure described in Section B.2.2 of Appendix B. Denote these ten boundary values by c1 , c2 , . . . c10 . Next, compute the cumulative errors α(tj ), j = 1, 2, . . . 10, where α(t1 ) = P0 [W (t1 ) ≥ c1 ] , and for j = 2, 3, . . . 10, α(tj ) = α(tj−1 ) + P0 [W (t1 ) < c1 , · · · , W (tj−1 ) < cj−1 , W (tj ) ≥ cj ] . These computations are clearly unaffected by the type of end point since the test statistic can be expressed in the general framework of Section B.1. Linear interpolation between these cumulative errors is then applied for setting up approximate spending functions for the type-1 and type-2 error probabilities to be used at the interim monitoring stage. This approach will make the resulting re-computed boundaries at the F.1 Inverting Ten-Look Power Boundaries 2347 <<< Contents F * Index >>> Spending Functions Derived from Power Boundaries interim monitoring stage enjoy approximately the same properties as the corresponding original boundaries obtained at the design stage while still providing flexibility to deviate from the pre-specified number and timing of the interim looks. However, as a consequence of fixing K = 10 for deriving the spending function in the interim monitoring module, even though we might have used a different value of K in the design module, there can be slight differences in the boundary values computed at the design stage and the boundary values computed at the interim monitoring stage. In practice this difference is negligible, as we show below in Section F.2.. F.2 Comparison of Design Boundaries and Interim Monitoring Boundaries At the design stage East computes the Wang and Tsiatis (1987) or Pampallona and Tsiatis (1990) power boundaries directly, as documented in Appendix B, Sections B.2.2 and B.2.3. These boundaries depend on K, the number of equally spaced interim looks. At the interim monitoring stage, however, East re-computes the stopping boundaries by inverting a ten-look error spending function, as documented above Section F.1. This implies that, even if the interim monitoring actually takes place as designed at K equally spaced looks, the design boundaries won’t match the interim monitoring boundaries, unless K = 10. This is not of much practical importance since, as a consequence of the flexible spending function methodology, interim monitoring will rarely occur at precisely the same time-points as was specified in the design. Table F.1and Table F.2, display the O’Brien-Fleming power boundaries obtained at the design stage, for K = 5 and K = 3, respectively, and the corresponding boundaries obtained by inverting a ten-look error spending function, for a two-sided test at α = 0.05. We observe that the difference between the design and interim monitoring boundaries is very small. Table F.1: Design and Interim Monitoring Boundaries for Five Equally Spaced Looks Look No. 1 2 3 4 5 2348 Information Fraction 0.2 0.4 0.6 0.8 1.0 Design Boundary ±4.562 ±3.226 ±2.634 ±2.281 ±2.040 F.2 Design versus Interim Monitoring Boundaries Monitoring Boundary ±4.692 ±3.285 ±2.656 ±2.285 ±2.035 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table F.2: Design and Interim Monitoring Boundaries for Three Equally Spaced Looks Look No. 1 2 3 F.3 Comparison of TenLook and Lan and DeMets Spending Functions Information Fraction 0.333 0.667 1.000 Design Boundary ±3.471 ±2.454 ±2.004 Monitoring Boundary ±3.518 ±2.487 ±1.998 We stated in Section B.2.2 of Appendix B that the power boundaries proposed by Wang and Tsiatis (1987) generate, as a special case, the boundaries of O’Brien and Fleming (1979) if the shape parameter takes on the value ∆ = 0. We also stated in Section B.2.4 of Appendix B that the LD(OF) spending function (Lan-DeMets spending function with O’Brien-Fleming flavor) of the form zα/4 α(t) = 4 − 4Φ √ (F.1) t generates two-sided boundaries similar to those proposed by O’Brien and Fleming. It is therefore of interest to see how the spending function derived from the ten-look design compares with α(t). The figure below shows that the two spending functions F.3 Ten-Look versus Lan-DeMets 2349 <<< Contents F * Index >>> Spending Functions Derived from Power Boundaries have very similar behaviors. Table F.3 displays the amount of type-I error actually spent, at each of five equally spaced looks by the two error spending functions given an overall type-I error of α = 0.05 . Corresponding stopping boundaries are also displayed. We note that the differences are very minor. The last column of Table F.3 displays the actual O’Brien and Fleming power boundaries based on shape parameter ∆ = 0, and number of looks K = 5, using the computations discussed in Appendix B, Section B.2.2. These boundaries too are very similar to the boundaries derived from the two error spending functions. 2350 F.3 Ten-Look versus Lan-DeMets <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table F.3: Comparing 10-Look and Lan-DeMets Spending Functions Look No. 1 2 3 4 5 Fraction (t) 0.2 0.4 0.6 0.8 1 Error Spent 10-Look α(t) 0.000003 0.000001 0.001020 0.000789 0.008262 0.007617 0.025008 0.024424 0.05 0000 0.050000 F.3 Ten-Look versus Lan-DeMets Stopping Boundaries 10-Look α(t) 5-look ±4.692 ±4.877 ±4.562 ±3.285 ±3.357 ±3.226 ±2.656 ±2.680 ±2.634 ±2.285 ±2.290 ±2.281 ±2.035 ±2.031 ±2.040 2351 <<< Contents * Index >>> G The Recursive Integration Algorithm Substantial savings in computational effort can be achieved in the computations of the group sequential boundaries. We will give details of this savings using one-sided tests of hypothesis with boundary only for the rejection of H0 .But the same applies to other situations. At the time of the j th interim monitoring, the group sequential boundary isdetermined by Pr0 (W (t1 ) < b1 , · · · , W (tj−1 ) < bj−1 , W (tj ) ≥ bj ) = α∗ (tj ) − α∗ (tj−1 ). (G.1) The probability above is evaluated by the recursive integration formula by Armitage, McPherson and Rowe (1969); the density function for W (t) in the discrete sequential procedure is given by −1/2 f1 (w; η) = t1 −1/2 φ[t1 (w − ηt1 )], and, by recursion Z bj−1 fj (w; η) = −∞ −1/2 fj−1 (v; η)∆tj −1/2 φ[∆tj (w − v − η∆tj )]dv (G.2) where ∆tj = tj − tj−1 , for j = 1, · · · , K, with t0 = 0, and φ is the standard normal density function. Equation (G.2) follows from the fact that, as discussed in Section B.1 of Appendix B, the distribution of W (tj ) is N (ηtj , tj ) with an independent increments structure.. To find the boundary for the j th interim monitoring, we simply need to find the value of bj such that Z ∞ fj (w; η)dw = α∗ (tj ) − α∗ (tj−1 ). bj Therefore, at each time of interim monitoring, instead of repeating the recursive numerical integration, we need to evaluate the numerical integration only once by storing internally previous boundary values b1 , · · · , bj−1 and the coordinates of the density function fj−1 (w; η) for −∞ < w < bj−1 . 2352 <<< Contents * Index >>> Theory - Multiple Comparison Procedures H H.1 Parametric Procedures H.1.1 Introduction H.1.2 Single Step Dunnett Test H.1.3 Step Down Dunnett Test H.1.1 Introduction Assume that there are k arms including the placebo arm. Let n0 be the number of subjects for placebo arm and ni the number of subjects for ith treatment arm Pk−1 (i = 1, 2, . . . , k − 1). Let N = i=0 ni be the total sample size. Let Yij be the response from subject j in treatment arm i and yij be the observed value of Yij (i = 0, 1, . . . , k − 1, j = 1, 2, . . . , ni ). Suppose that Yij = µi + eij (H.1) where eij ∼ N (0, σ 2 ). Let ȳi (i = 0, 1, . . . , k − 1) be the sample mean for treatment arm i and s2 be the pooled sample variance for all arms. Let Ti = qȳi1−ȳ0 1 be the s ni +n 0 test statistic for comparing treatment effect of arm i with placebo. Let T(1) ≥ T(2) ≥ . . . ≥ T(k−1) be the ordered statistics of Ti . Let ti (i = 1, . . . , k − 1) be the observed values of Ti and t(1) ≥ t(2) ≥ . . . ≥ t(k−1) be the observed values of T(1) ≥ T(2) ≥ . . . ≥ T(k−1) . We are interested in the following hypotheses For the right tailed test:Hi : µi − µ0 ≤ 0 vs Ki : µi − µ0 > 0 For the left tailed test: Hi : µi − µ0 ≥ 0 vs Ki : µi − µ0 < 0 For the global null hypothesis:H0 : µ0 = µ1 = µ2 = . . . = µk−1 vs H01 : At least one µi > µ0 for right tailed test (µi < µ0 for left tailed test) H.1.2 Single Step Dunnett Test in One-Way ANOVA Design Let F (x) denote the distribution function of T(1) under the global null hypotheis H0 , i.e. Z ∞Z ∞ F (x) = Pr T(1) ≤ x = JdΦ (z) dψν (u) (H.2) 0 where J = Qk−1 i=1 Φ γi z+xu √ 1−γi2 −∞ and Φ (.) be the cumulative distribution function of standard normal variable such that 2 dΦ (z) 1 z = √ exp − dz 2 2π (H.3) is the standard normal density function and dψν (u) = du Γ ν ν2 νu2 ν−1 u exp − ν ν 2 −1 2 2 2 H.1 Parametric Procedures – H.1.2 Single Step Dunnett Test (H.4) 2353 <<< Contents H * Index >>> Theory - Multiple Comparison Procedures q is the density of Vν , where V is a Chi-squared random variable with ν degrees of freedom and ν = N − k. The parameter γi is r ni γi = (H.5) n0 + ni Test statistics: ȳi − ȳ0 Ti = q (i = 1, 2, . . . , k − 1) s n1i + n10 where s2 = k−1 ni 1 XX 2 (yij − ȳi ) N − k i=0 j=1 (H.6) (H.7) is the pooled sample variance. The critical values for single step Dunnett, denoted by cα , satisfied the following equation – For the right tailed test Z ∞ 0 Z −∞ " # γi z + cα u Φ p dΦ (z) dψν (u) = 1 − α 1 − γi2 i=1 ∞ k−1 Y – For the left tailed test Z ∞Z ∞ JdΦ (z) dψν (u) = 1 − α 0 where J = Qk−1 i=1 (H.8) (H.9) −∞ 1−Φ γ√ i z+cα u 1−γi2 . Decisions: – For the right tailed test, reject Hi if ti > cα – For the left tailed test, reject Hi if ti < cα Adjusted p−values for individual hypothesis Hi : p̃i = 1 − F (ti ) where – For the right tailed tests: " # Z ∞ Z ∞ k−1 Y γi z + ti u F (ti ) = Φ p dΦ (z) dψν (u) 1 − γi2 0 −∞ i=1 2354 H.1 Parametric Procedures – H.1.2 Single Step Dunnett Test (H.10) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 – For the left tailed tests: ∞ Z ∞ Z JdΦ (z) dψν (u) F (ti ) = where J = Qk−1 i=1 (H.11) −∞ 0 1 − Φ γ√i z+ti u2 . 1−γi Adjusted p− value for testing the global null hypothesis H0 : – For the right tailed tests p̃ = 1 − F (t(1) ) where t(1) = max {ti : i = 1, . . . , k − 1} and Z ∞Z ∞ F (t(1) ) = JdΦ (z) dψν (u) where J = Qk−1 i=1 Φ γi z+t(1) u √ 1−γi2 . – For the left tailed tests p̃ = 1 − F (t(k−1) ) where t(k−1) = min {ti : i = 1, . . . , k − 1} and Z ∞Z ∞ F t(k−1) = JdΦ (z) dψν (u) 0 where J = H.1.3 Qk−1 i=1 1−Φ (H.12) −∞ 0 γi z+t(k) u √ 1−γi2 (H.13) −∞ . Step Down Dunnett Test in One-Way ANOVA Let H(i) be the associated null hypothesis with t(i) (i = 1, . . . , k − 1). Let n(i) be the number of subjects for the treatment arm associated with H(i) . Let Rk−1 be the correlation matrix of the unordered statistics associated with H(1) , H(2) , . . . , H(k−1) q n which has the element at ith row and jth column ρij = γi γj where γi = n(i)(i) +n0 q n (j) and γj = n(j) +n0 . Let ν = N − k. Let ci (i = 1, 2, . . . , k − 1) be the critical values for step-down Dunnett procedure. Let Φ (.) be the cumulative distribution function of standard normal variable such that 2 1 z dΦ (z) = √ exp − (H.14) dz 2 2π is the standard normal density function and dψν (u) = du Γ ν ν2 νu2 ν−1 u exp − ν ν 2 −1 2 2 2 H.1 Parametric Procedures – H.1.3 Step Down Dunnett Test (H.15) 2355 <<< Contents H * Index >>> Theory - Multiple Comparison Procedures q is the density of U = Vν , where V is a Chi-squared random variable with ν degrees of freedom and ν = N − k. Test statistics: ȳi − ȳ0 Ti = q s n1i + n10 where s2 = k−1 ni 1 XX 2 (yij − ȳi ) N − k i=0 j=1 (H.16) (H.17) is the pooled sample variance for all arms. Critical values ci satisfy the following equations – For the right tailed tests Z ∞ Z ∞ Gi (ci ) = where J = Qk−1 j=i JdΦ (z) dψν (u) = 1 − α (H.18) JdΦ (z) dψν (u) = 1 − α (H.19) −∞ 0 γj z+ci u . Φ √ 2 1−γj – For the left tailed tests Z ∞ Z ∞ Gi (ci ) = 0 where J = −∞ γj z+ci u √ . 2 j=1 1 − Φ Qi 1−γj Decisions: The step down Dunnett procedure can be carried out as follows: – For the right tailed tests ∗ Step 1: If t(1) > c1 , reject H(1) and go to the next step; otherwise retain all hypotheses and stop. ∗ Step i = 2, . . . , k − 2: If t(i) > ci , reject Hi and go to the next step; otherwise retain H(i) , H(i+1) , . . . , H(k−1) and stop. ∗ Step k − 1: If t(k−1) > ck−1 , reject H(k−1) and stop; otherwise retain H(k−1) and stop. – For the left tailed tests ∗ Step 1: If t(k−1) < c1 , reject H(k−1) and go to the next step; otherwise retain all hypotheses and stop. 2356 H.1 Parametric Procedures – H.1.3 Step Down Dunnett Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ∗ Step i = 2, . . . , k − 2: If t(k−i) < ci , reject Hk−i and go to the next step; otherwise retain H(1) , H(2) , . . . , H(k−i) and stop. ∗ Step k − 1: If t(1) < ck−1 , reject H(1) and stop; otherwise retain H(1) and stop. Adjusted p-values for individual hypothesis: – For the right tailed test ( pi p̃(i) = max p̃(i−1) , pi if i = 1 if i = 2, . . . , k − 1 (H.20) where (H.21) γj z + t(i) u dΦ (z) dψν (u) Φ q 1 − γj2 j=i (H.22) pi = 1 − F t(i) ∞ Z Z Gi t(i) = ∞ k−1 Y −∞ 0 – For the left tailed test ( p̃(i) = if i = k − 1 if i = k − 2, . . . , 1 pi max p̃(i+1) , pi (H.23) where pi = 1 − F t(i) Z Gi t(i) = 0 ∞ (H.24) γj z + t(i) u 1 − Φ q dΦ (z) dψν (u) −∞ j=1 1 − γj2 (H.25) Z ∞ i Y Adjusted p-value for the global null hypothesis – For the right tailed test p̃ = p̃(1) = p1 – For the left tailed test p̃ = p̃(k−1) = pk−1 H.1 Parametric Procedures – H.2.3 Step Down Dunnett Test 2357 <<< Contents H H.2 * Index >>> Theory - Multiple Comparison Procedures P-value based procedures H.2.1 Hypotheses etc.continuous response H.2.2 Hypotheses etc. binary response H.2.3 Bonferroni Procedure H.2.4 Sidak Procedure H.2.5 Weighted Bonferroni Procedure H.2.6 Holm Step-Down Procedure H.2.7 Hochberg Step-Up Procedure H.2.8 Fixed Sequence Testing Procedure H.2.9 Hommel Step-Up Procedure H.2.10 Fallback Procedures H.2.1 Hypotheses, test statistics and marginal p-values for continuous response Individual hypotheses: – For the right tailed tests Hi : µi ≤ µ0 vs Ki : µi > µ0 (i = 1, ..., k − 1) (H.26) – For the left tailed tests Hi : µi ≤ µ0 vs Ki : µi > µ0 (i = 1, ..., k − 1) (H.27) where k is the total number of arms. Global null hypothesis: H0 : µ0 = µ1 = . . . = µk−1 (H.28) against the alternative hypothesis H01 : at least one µi > µ0 for right tailed test or µi < µ0 for left tailed test. Test statistics: The calculation for test statistics is slightly different depending on whether the checkbox for Common Standard Deviationis checked or not. – If Common Standard Deviation for design is checked (or Equal Variance for analysis is selected), ȳi − ȳ0 Ti = q (i = 1, 2, . . . , k − 1) s n1i + n10 where s2 = k−1 ni 1 XX 2 (yij − ȳi ) N − k i=0 j=1 (H.29) (H.30) is the variance estimate pooled for all arms, yij is the response for j th subject in ith arm, ȳi is the sample mean for the ith arm, N is the total sample size and ni (i = 0, 1, . . . , k − 1) is the number of subjects in arm i 2358 H.2 P-value based procedures – H.2.1 Hypotheses etc.-continuous response <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 – If Common Standard Deviation for design is not checked (Unequal Variance for analysis is selected ), Ti = q ȳi − ȳ0 1 2 ni si where + (i = 1, 2, . . . , k − 1) 1 2 n0 s0 (H.31) n s20 = 0 1 X 2 (y0j − ȳ0 ) n0 − 1 j=1 (H.32) is the variance estimate for the control arm and n s2i = i 1 X 2 (yij − ȳi ) ni − 1 j=1 (H.33) is the variance estimate for the ith arm. Marginal p-values: – For the right tailed test pi = P (Ti > ti ) = 1 − P (Ti < ti ) = Φ (−ti ) (H.34) – For the left tailed test pi = P (Ti < ti ) = Φ (ti ) (H.35) where Ti follows t distribution with degree of freedom ν and Φ (.) is the cumulative distribution function of the t distribution with degree of freedom ν and the value of ν depends on whether the checkbox for Common Standard Deviation for design is checked ( or the radio button for Equal Variance or Unequal Variance for analysis is selected) – If Common Standard Deviation for design is checked (or Equal Variance for analysis is selected) ν =N −k H.2 P-value based procedures – H.2.2 Hypotheses etc. binary response (H.36) 2359 <<< Contents H * Index >>> Theory - Multiple Comparison Procedures – If Common Standard Deviation for design is not checked (or Unequal Variance for analysis is selected ) 2 # 2 2 " 2 2 si /ni s20 /n0 si s20 ν= / + + (H.37) ni n0 ni − 1 n0 − 1 H.2.2 Hypotheses, test statistics and marginal p-values for binary response Individual hypotheses: – For the right tailed test Hi : πi − π0 = 0 vs Ki : πi − π0 > 0 (i = 1, 2, ..., k − 1) (H.38) – For the left tailed test Hi : πi − π0 = 0 vs Ki : πi − π0 < 0 (i = 1, 2, ..., k − 1) (H.39) where k is the total number of arms. Global null hypothesis H0 : π0 = π1 = . . . = πk−1 (H.40) against the alternative H01 :at least one πi > π0 for right tailed test (πi < π0 for left tailed test). Test statistics: The calculation for test statistics is slightly different depending on whether Pooled Variance or Unpooled Variance is selected. – If Pooled Variance is selected, π̂i − π̂0 Ti = r π̃i (1 − π̃i ) n10 + 1 ni (i = 1, 2, . . . , k − 1) (H.41) where π̂i is the sample proportion for the ith arm, π̂0 is the sample 0 π̂0 proportion for the control arm, π̃i = ni π̂nii+n is the pooled sample +n0 proportion, N is the total sample size and ni (i = 0, 1, . . . , k − 1) is the number of subjects in arm i 2360 H.2 P-value based procedures – H.2.2 Hypotheses etc. binary response <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 – If Unpooled Variance is selected, Ti = q π̂i − π̂0 1 ni π̂i (1 − π̂i ) + 1 n0 π̂0 (i = 1, 2, . . . , k − 1) (H.42) (1 − π̂0 ) where π̂i is the sample proportion for the ith arm π̂0 is the sample proportion for the control arm. Marginal p-values: – For the right tailed test pi = P (Ti > ti ) = 1 − P (Ti < ti ) = Φ (−ti ) (H.43) – For the left tailed test pi = P (Ti < ti ) = Φ (ti ) (H.44) where Ti follows standard normal distribution and Φ (.) is the cumulative distribution function H.2.3 Bonferroni Procedure Suppose p1 , p2 , . . . , pk−1 are the marginal p-values associated with Hi (i = 1, 2, . . . , k − 1). Let p(1) ≤ p(2) ≤ . . . ≤ p(k−1) be the ordered p-values. Suppose α is the significance level. α The Bonferroni procedure will reject Hi , if pi < k−1 , i = 1, 2, . . . , k − 1. The adjusted p−value for the individual hypothesis Hi is given by p̃i = min (1, (k − 1) pi ) , i = 1, 2, . . . , k − 1 (H.45) The adjusted p−value for the global null hypothesis is given by p̃ = min {p̃i : i = 1, 2, . . . , k − 1} = min 1, mp(1) H.2.4 (H.46) Sidak Procedure Let p1 , p2 , . . . , pk−1 be the marginal p-values associated with Hi (i = 1, 2, . . . , k − 1). Let p(1) ≤ p(2) ≤ . . . ≤ p(k−1) be the ordered p− values. Let α be the significance level. H.2 P-value based procedures – H.2.4 Sidak Procedure 2361 <<< Contents H * Index >>> Theory - Multiple Comparison Procedures 1 The Sidak procedure will reject Hi if pi < 1 − (1 − α) k−1 , i = 1, 2, . . . , k − 1. The adjusted p−value for the individual hypothesis Hi is given by k−1 p̃i = 1 − (1 − pi ) , i = 1, 2, . . . , k − 1 (H.47) The adjusted p−value for the global null hypothesis is given by p̃ = min {p̃i : i = 1, 2, . . . , k − 1} k−1 = 1 − 1 − p(1) H.2.5 (H.48) Weighted Bonferroni Procedure Let p1 , p2 , . . . , pk−1 be the marginal p−values associated with Hi (i = 1, 2, . . . , k − 1). Let α be the significance level. Let α be the overall type I error rate. Let w1 , w2 , . . . , wk−1 be the proportions indicating the allocations of α to Pk−1 each hypothesis such that i=1 wi = 1. Let p(1) ≤ p(2) ≤ . . . ≤ p(k−1) be the order p-values. The weighted Bonferroni procedure will reject Hi if pi < wi α, i = 1, 2, . . . , k − 1. The adjusted p−value for the individual hypothesis Hi is given by pi p̃i = min 1, , i = 1, 2, . . . , k − 1 wi (H.49) The adjusted p−value for the global null hypothesis is given by p̃ = min {p̃i : i = 1, 2, . . . , k − 1} (H.50) 1 Note that, if w1 = w2 = . . . = wk−1 = k−1 , the weighted Bonferroni procedure is reduced to the regular Bonferroni procedure. H.2.6 Holm Step-Down Procedure Let p1 , p2 , . . . , pk−1 be the marginal p−values. Let p(1) ≤ p(2) ≤ . . . ≤ p(k−1) be the order p− values and H(i) (i = 1, 2, . . . , k − 1) be the associated hypotheses. Let α be the significance level. Holm (1979) step-down procedure is carried out as follows: α Step 1: If p(1) ≤ k−1 , reject H(1) and go to the next step. Otherwise retain all hypotheses and stop 2362 H.2 P-value based procedures – H.2.6 Holm Step-Down Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 α Step i = 2, . . . , k − 2: If p(i) ≤ k−i , reject H(i) and go to the next step. Otherwise retain H(i) , . . . , H(k−1) and stop Step k − 1. If p(k−1) ≤ α, reject H(k−1) and stop. Otherwise retain H(k−1) and stop. The adjusted p−value for the individual hypothesis H(i) (i = 1, 2, . . . , k − 1) is given by ( if i = 1, min 1, (k − 1) p(i) (H.51) p̃(i) = max p̃(i−1) , (k − i) p(i) , 1 if i = 2, . . . , k − 1. The adjusted p−value for the global hypothesis H0 is p̃(1) = min 1, (k − 1) p(1) H.2.7 (H.52) Hochberg Step-Up Procedure Let p1 , p2 , . . . , pk−1 be the marginal p−values. Let p(1) ≤ p(2) ≤ . . . ≤ p(k−1) be the order p− values and H(i) (i = 1, 2, . . . , k − 1) be the associated hypotheses. Let α be the significance level. Hochberg (1988) step-up procedure is carried out as follows: Step 1: If p(k−1) > α, retain H(k−1) and go to the next step. Otherwise reject all hypotheses and stop Step i = 2, . . . , k − 2: if p(k−i) > αi , retain H(k−i) and go to the next step. Otherwise reject all remaining hypotheses and stop. α Step k − 1: If p(1) > k−1 , retain H(1) and stop. Otherwise reject H(1) and stop. The adjusted p− values for individual hypothesis is given by ( p(i) if i = k − 1 p̃(i) = min p̃(i+1) , (k − i) p(i) if i = k − 2, k − 3, . . . , 1 The adjusted p− value for the global null hypothesis is p̃ = min p̃(i) : i = 1, 2, . . . , k − 1 (H.53) (H.54) = min p(k−1) , 2p(k−2) , . . . , ip(k−i) , . . . , (k − 1) p(1) Compared with Simes adjusted p-value, Hochberg adjusted p-value tends to be larger for testing the global hypothesis. H.2 P-value based procedures – H.2.8 Fixed Sequence Testing Procedure 2363 <<< Contents H * Index >>> Theory - Multiple Comparison Procedures H.2.8 Fixed Sequence Testing Procedure Assume that H1 , H2 , . . . , Hk−1 are ordered hypotheses and the order is prespecified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , . . . , pk−1 be the associated raw marginal p values. Let α be the significance level. The fixed sequence testing procedure can be carried out as follows: Step 1: If p1 < α, reject H1 and go the next step. Otherwise retain all hypotheses and stop. Step i = 2, 3, . . . , k − 2: If pi < α, reject Hi and go the the next step. Otherwise retain Hi , Hi+1 , . . . , Hk−1 . Step k − 1: If pk−1 < α, reject Hk−1 and stop. Otherwise retain Hk−1 and stop. The adjusted p− values for individual hypothesis Hi (i = 1, . . . , k) is given by p̃i = max {p1, p2 , . . . , pi } (H.55) The adjusted p− value for the global null hypothesis is given by p̃ = p1 H.2.9 (H.56) Hommel Step-Up Procedure Let p1 , p2 , . . . , pk−1 be the marginal p-values. Let p(1) ≤ p(2) ≤ . . . ≤ p(k−1) be the ordered p-values and H(i) (i = 1, 2, . . . , k − 1) be the associated hypotheses. Let α be the significance level. The Hommel procedure is carried out as follows: Step 1: If p(k−1) > α, retain H(k−1) and go to the next step. Otherwise reject all hypotheses and stop. Step i = 2, . . . , k − 2: If p(k−j) > i−j+1 α for j = 1, . . . , i, retain H(k−i) and i go to the next step. Otherwise reject all remaining hypotheses with α p(k−1) < i−1 and stop. k−j Step k − 1: If p(k−j) > k−1 α for j = 1, . . . , k − 1, retain H(1) ; otherwise reject α H(1) if p(1) < k−2 . Another way of describing Hommel procedure is as follows: Let J ⊆ {1, 2, . . . k − 1} be defined as J = {i | i belongs to {1, 2, . . . , k − 1} such that p(k−j) > i−j+1 α for all j = 1, 2, ..., i}. If J i is nonempty, reject Hk−1 whenever pk−1 ≤ iα0 with i0 = maxi∈J {i}. If J is empty, reject all Hi (i = 1, ...., k − 1). 2364 H.2 P-value based procedures – H.2.9 Hommel Step-Up Procedure <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The adjusted p−values for Hommel procedure can be calculated as p̃i = max {pI : i ∈ I} (H.57) where pI denotes the p-value for testing the intersection hypothesis HI using Simes (1986) test. H.2.10 Fallback Procedures Assume that H1 , H2 , . . . , Hk−1 are ordered hypotheses and the order is prespecified so that H1 is tested first followed by H2 and so on. Let p1 , p2 , . . . , pk−1 be the marginal p values. Let α be the overall type I error rate. Let w1 , w2 , . . . , wk−1 be the Pk−1 proportions indicating the allocations of α to each hypothesis such that i=1 wi = 1. The amount of type I error assigned to hypothesis Hi (i = 1, 2, . . . k − 1) is wi α.The fallback procedures can be carried out as follows: Step 1: Test H1 at α1 = w1 α. If p1 ≤ α1 , reject H1 and go to the next step; otherwise retain it and go to the next step Step i = 2, . . . , k − 2: Test Hi at αi = αi−1 + wi α if Hi−1 is rejected and at αi = wi α if Hi−1 is retained. If pi ≤ αi , reject Hi ; otherwise retain it and go to the next step. Step k − 1: Test Hk−1 at αk−1 = αk−2 + wk−1 α if Hk−2 is rejected and at αk−1 = wk−1 α if Hk−2 is retained. If pk−1 ≤ αk−1 , reject Hk−1 ; otherwise retain it. The adjusted p−values for the fallback procedure can be computed as p̃i = max {pJ } J:i∈J (H.58) where pJ denotes the p−value for testing the intersection hypothesis HJ using weighted Bonferroni test. The algorithm is described in Appendix A in the paper by Wiens and Dmitrienko (2005). The fallback procedure is equivalent to the closed test using Weighted Bonferroni for the intersection hypotheses. The following algorithm described in Appendix A in the paper by Wiens and Dmitrienko (2005) is used to assign weights to each elementary hypothesis of a particular intersection hypothesis. Let I = {1, 2, . . . , k − 1} be the index set. Assume that H1 , H2 , . . . , Hk−1 is already ordered so that H1 is tested first followed by H2 and so on as described in the fall back procedure above. Let w1 , w2 , . . . , wk−1 be the associated weights initially assigned to Pk−1 H1 , H2 , . . . , Hk−1 respectively such that i=1 wi ≤ 1. For any intersection hypothesis HJ , let v = (v1 (HJ ) , v2 (HJ ) , . . . , vk−1 (HJ )) be the decision vector to test HJ . This decision vector represents a weighted Bonferroni test for HJ in the following sense. We will compare p1 with v1 (HJ ) α, p2 with v2 (HJ ) α,..., pk−1 with vk−1 (HJ ) α. The following algorithm shows how to determine the decision vector for a particular intersection hypothesis HJ . H.2 P-value based procedures – H.2.10 Fallback Procedures 2365 <<< Contents H * Index >>> Theory - Multiple Comparison Procedures Step 1: v1 (HJ ) = w1 if HJ contains H1 and 0 otherwise Step 2: v2 (HJ ) = w1 + w2 − v1 (HJ ) if HJ contains H2 and 0 otherwise. ...... Step i: vi (HJ ) = w1 + w2 + . . . + wi − v1 (HJ ) − v2 (HJ ) − . . . − vi−1 (HJ ) if HJ contains Hi and 0 otherwise. ...... Step k − 1: vk−1 (HJ ) = w1 + . . . + wk−1 − v1 (HJ ) − v2 (HJ ) − . . . − vk−2 (HJ ) if HJ contains Hk−1 and 0 otherwise. Once we obtain the decision vector v according to the above algorithm, we can compute the weighted Bonferroni adjusted p-values as follows for a particular intersection hypothesis HJ as follows pJ = min i=1,...,k−1 {pi /vi (HJ )} (H.59) Consequently, the adjusted p-value for fallback procedure is p̃i = max {pJ } J:i∈J (H.60) For example, suppose we have three hypotheses of interest H1 , H2 , H3 and w1 , w2 , w3 are the associated weights. The fallback procedure is carried out as follows: Step 1: Test H1 at α1 = w1 α. If p1 ≤ α1 , reject H1 and go to the next step; otherwise retain it and go to the next step Step 2: Test H2 at α2 = α1 + w2 α if H1 is rejected and at α2 = w2 α if H1 is retained. If p2 ≤ α2 , reject H2 ; otherwise retain it and go to the next step. Step 3: Test H3 at α3 = α2 + w3 α if H2 is rejected and at α3 = w3 α if H2 is retained. If p3 ≤ α3 , reject H3 ; otherwise retain it. To calculate the adjusted p-values, we first need to obtain the decision vectors for all the intersection hypotheses. In this example, we have 7 intersection hypotheses including the three single hypotheses. The decision vectors are given in the following table Hence the adjusted p-value for H 1 is max p{123} , p{12} , p{13} , p{1} . Similarly the adjusted p-value for H2 is max p{123} , p{12} , p{23} , p{12} and that for H3 is max p{123} , p{13} , p{23} , p{3} . 2366 H.2 P-value based procedures – H.2.10 Fallback Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 H.3 Generate Means/Proportions through DR Curves Intersection Decision Vectors H{123} (w1 , w2 , w3 ) H{12} (w1 , w2 ) H{13} (w1 , w2 + w3 ) H{23} (w1 + w2 , w3 ) Bonferrni p-values n o p{123} = min wp11 , wp22 , wp33 n o p{12} = min wp11 , wp22 n o 3 p{13} = min wp11 , w2p+w 3 o n p3 p2 p{23} = min w2 +w3 , w3 H{1} H{2} H{3} w1 w1 + w2 w1 + w2 + w3 p{1} = wp11 2 p{2} = w1p+w 2 p{3} = w1 +wp32 +w3 Four Parameter Logistic E (Y | D) = β + δ 1 + exp θ−D τ (H.61) where τ > 0, −∞ < β, δ, θ < ∞ Linear E (Y | D) = a + bD (H.62) where E0 is the intercept and b represents the slope. Quadratic E (Y | D) = E0 + B1 ∗ D + B2 ∗ D2 (H.63) where E0 represents the mean response for placebo, B1 represents the linear coefficient and B2 represents the quadratic coefficient. Emax E (Y | D) = E0 + Emax 1 + exp {S [ln (ED50 ) − ln (D)]} (H.64) where E0 represents the y-intercept, Emax is the difference between the mean response at a very large dose and placebo, ED50 > 0 is the value of the dose that gives a response of E0 + 21 Emax and S > 0 is a slope factor (Hill parameter) that controls the rate at which response increases as a function of dose at ED50 . H.3 DR Curves 2367 <<< Contents * Index >>> I I.1 Theory - Multiple Endpoint Procedures Serial Gatekeeping Assume that we are interested in testing K endpoints which are grouped into m families F1 , F2 , . . . , Fm . A family is called a serial gatekeeper if all hypotheses must be rejected within that family in order to proceed to test the hypotheses in the next family. In other words, if Fi (i = 1, 2, . . . , m − 1) is a serial gatekeeper, then hypotheses in the next family Fi+1 are tested only if all the hypotheses in Fi are rejected. Serial gatekeeping over m families is implemented in the following m steps. Note that in the following serial gatekeeping testing procedure any α-level FWER-controlling multiple testing procedure can be used for testing the preceding m-1 families. But since we need to reject all hypotheses in one family in order to proceed to test the next family, the most powerful test is the intersection-union (IU) test. The IU test is a min test which is tailored to test a composite null hypothesis. For i example, the IU test would reject ∪nj=1 Hini if all the hypotheses Hini in Fi are rejected at their α-level tests, i.e. maxj=1,...,ni pij ≤ α. Serial gakekeeping procedure based on intersection-union test Step 1: Test all the hypotheses in F1 at their nominal α levels using the intersection-union test; i.e., reject all H1j if maxj=1,...,n1 p1j ≤ α, j = 1, 2, ...n1 . If all the n1 hypotheses are rejected, go to Step 2, otherwise stop. The term intersection-union test arises because, as shown by Berger (1982), this procedure offers level-α protection against rejecting the null hypothesis 1 1 H1 = ∪nj=1 H1j in favor of the alternative hypothesis H̄1 = ∩nj=1 H̄1j . Step 2: Test all the hypotheses in F2 at their nominal α levels using the intersection-union test. If all the hypotheses are rejected, go to step 3, otherwise stop. .. . Step m-1: Test all the hypotheses in Fm−1 at their nominal α levels using the intersection-union test. If all the hypotheses are rejected, go to step m, otherwise stop. Step m: Test all the hypotheses in Fm using any multiple testing procedure that guarantees strong control of type-1 error within the family Fm . To obtain adjusted p-values, let p∗i denote the largest p-value in Fi , for 2368 I.1 Serial Gatekeeping <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 i = 1, 2, ...m − 1. Then ( max (p∗1 , p∗2 , . . . , p∗i ) p̃ij = 0 max pij , p∗1 , p∗2 , . . . , p∗i−1 if i = 1, 2, . . . , m − 1 if i = m 0 where pmj is the adjusted p-value for Hmj based on the multiple testing procedure that has been adopted for family Fm . In terms of adjusted p-values, the serial gatekeeping condition implies that hypotheses in family Fi+1 will only be tested if max p̃ij ≤ α j=1,...,ni I.2 Parallel Gatekeeping Assume that we are interested in testing K endpoints which are grouped into m families F1 , F2 , . . . , Fm . Fi is termed a parallel gatekeeper if at least one hypothesis within it must be rejected in order to proceed to family Fi+1 (i = 1, 2, ...m − 1). We consider the general multistage parallel gatekeeping procedure proposed by Dmitrienko, Tamhane and Wiens 2008. Control of the FWER relies on using a so-called “separable” multiple testing procedure. In order to define separable tests we require the concept of an error rate function. Consider the problem of testing a single family of n null hyptheses H1 , H2 , . . . , Hn . Let I ⊆ N be the index set of true null hypotheses. The error rate function e(I) of a multiple testing procedure is the maximum probability of making at least one type-1 error. ! e(I) = sup P [ {Reject Hi } | HI i∈I where HI = ∩i∈I Hi and the supremum is computed over the the entire parameter space of the hypotheses in N \I. In other words, e(I) is error that the multiple testing procedure will produce under the worst configuration of alterative hypotheses for a specific set of I ⊆ N null hypotheses. An explicit expression for e(I) is not generally available, but an upper bound can be used instead. A multiple testing procedure is separable if its error rate is strictly less than α unless all hypotheses are true. That is, e(I) < α for all. Suppose p(1) ≤ p(2) ≤ . . . ≤ p(n) are the ordered p-values for corresponding null hypotheses H(1) , H(2) , . . . , H(n) . Then the following three multiple testing procedures are separable. I.2 Parallel Gatekeeping 2369 <<< Contents I * Index >>> Theory - Multiple Endpoint Procedures Bonferroni Test: Bonferroni Test: The upper bound of the error rate function for Bonferroni test is given by e(I) = |I| α n where | I | is the cardiality of set I. Note that Bonferroni procedure is separable. But regular Holm or Hochberg is not separable. To see this, consider a family of two hypotheses where one hypothesis is true and the other hypothesis is infinitely false. The type I error of regular Holm applied to such a family of hypotheses would be α. Similar argument applies to regular Hochberg procedure. Hence regular Holm or Hochberg can’t be directly used in parallel gatekeeping procedure. However we can modify them by taking the convex combination of their own critical values and the critical values of Bonferroni test. The modified procedures are separable which we call truncated Holm and truncated Hochberg described as follows. Truncated Holm: For any prespecified truncation fraction γ, the truncated Holm test performs as follows Step 1: If p(1) ≤ α n , then reject H(1) and go to the next step, otherwise retain all hypotheses. γ Step 2: If p(2) ≤ n−1 + 1−γ α, then reject H(2) and go to the next step, n otherwise retain H(2) , H(3) , . . . , H(n) and stop. .. . γ Step i: If p(i) ≤ n−i+1 + 1−γ α, then reject H(i) and go to the next step, n otherwise retain H(i) , H(i+1) , . . . , H(n) and stop. .. . Step n: If p(n) ≤ γ + 2370 I.2 Parallel Gatekeeping 1−γ n α, then reject H(n) , otherwise retain H(n) . <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The upper bound of the error rate function for truncated Holm is given by ( γ + (1−γ)|I| α if | I |> 0 n e(I) = 0 if | I |= 0 Truncated Hochberg: For any prespecified truncation fraction γ, the truncated Hochberg test performs as follows: Step 1: If p(n) ≤ γ + 1−γ α, then rejects all hypotheses and stop, otherwise n retain H(n) and go to the next step to test H(n−1) . Step 2: If p(n−1) ≤ γ2 + 1−γ α, then rejects H(1) , H(2) , . . . , H(n−1) and stop, n otherwise retain H(n−1) and go to the next step to test H(n−2) . .. . γ Step i: If p(i) ≤ n−i+1 + 1−γ α, then reject H(1) , H(2) , . . . , H(i) and stop, n otherwise retain H(i) and go to the next step to test H(i−1) . .. . Step n: If p(1) ≤ α n, then reject H(1) and stop, otherwise retain H(1) and stop. The upper bound of truncated Hochberg test is given by i o n h ( γ + 1−γ α for all i ∈ I 1 − P p(i) (I) > |I|−i+1 n e(I) = 0 if | I |> 0 if | I |= 0 In general, the upper bound on e(I) for truncated Holm can also be used for truncated Hochberg. Using the above expression, we can obtain more stringent upper bound than the one for truncated Holm. Consequently, more type I error will be carried over to the next family. Observe that the above expression for the error rate function requires knowledge of the joint distribution of the p-values. If the p-values are for comparisons of multiple treatments versus a common control, then the correlations among them are known and the error rate function can be evaluated. If, however, the p-values are for comparisons of a single treatment versus a control with respect to multiple endpoints, we typically will not know the correlations amongst these endpoints. In that case we can obtain a conservative upper bound for the error rate function by assuming I.2 Parallel Gatekeeping 2371 <<< Contents I * Index >>> Theory - Multiple Endpoint Procedures independence of p-values and using the following result due to Sen (1999): Let U(1) < . . . < U(k) denote the order statistics of k > 1 i.i.d. observations from a uniform (0,1) distribution. For any 0 < a1 < . . . < ak < 1, P (a1 , a2 , . . . , ak ) = P U(i) > ai for all i = 1, . . . , k = k!Hk (1) Ru where Hi (u) = ai Hi−1 (v)dv, i = 1, . . . , k and H0 (u) = I(u ≥ a1 ) and I(.) is an indicator function. Consider m ≥ 2 families, Fi = {Hi1 , . . . , Hini } (1 ≤ i ≤ m) of null hypotheses. Let Ni = {1, 2, . . . , ni } and Ai ⊆ Ni be the index set corresponding to the accepted hypotheses in Fi . Parallel gatekeeping is implemented in the following m steps. Step 1 : Let α1 = α and test all hypotheses in F1 at level α1 using any separable multiple testing procedure (Bonferroni, Truncated Holm, Truncated Hochberg) with a suitable upper bound on the error rate function e1 (I). If A1 = N1 , i.e., no hypotheses in F1 are rejected, then stop testing and retain all hypotheses in F2 , . . . , Fm ; otherwise go to the next step. Step 2: Let α2 = α1 − e1 (A1 ) and test all hypotheses in F2 at level α2 using any of the separable multiple test procedures with a suitable upper bound on the error rate function e2 (I). If A2 = N2 , i.e. no hypotheses in F2 are rejected, then stop testing and retain all hypotheses in F3 , . . . , Fm ; otherwise go to the next step. .. . Step i: Let αi = αi−1 − ei−1 (Ai−1 ) and test all hypotheses in Fi at level αi using any of the separable multiple test procedures with a suitable upper bound on the error rate function ei (I). If Ai = Ni , i.e. no hypotheses in Fi are rejected, then stop testing and retain all hypotheses in Fi+1 , . . . , Fm ; otherwise go to the next step. .. . 2372 I.2 Parallel Gatekeeping <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Step m: Let αm = αm−1 − em−1 (Am−1 ) and test all hypotheses in Fm at level αm using any of multiple test procedures which don’t have to be separable. Adjusted P Values: The adjusted p-values associated with the gatekeeping procedure can be computed by k looping through a discrete grid of significance levels. Let α = K (0 < k < K) for some sufficiently large value of K. The adjusted p-value, p̃ij for hypotheses Hij is the smallest α (corresponding to the smallest k) for which Hij is rejected. I.2 Parallel Gatekeeping 2373 <<< Contents * Index >>> Theory-Multi-arm Multi-stage Group Sequential Design J J.1 Notations Let δi = µi − µ0 be the mean difference for group i versus control group. Suppose that there are K1 analysis times including final one. Assume unequal sample size allocation. We use the first subscript index to denote doses and the second subscript to denote interim analysis time. Let ni1 < . . . < niK1 (i = 0, 1, 2, . . . D) be the cumulative sample size for group i at each interim where 0 denotes control arm. Let ni(j) be the incremental sample size from look j − 1 to look j. Let σi2 (i = 0, 1, 2 . . . D) be the variance for responses in the ith group. Let X̄i(j) (i = 0, 1, 2, . . . , D; j = 1, 2 . . . K1 ) be the sample mean based on the incremental data from look j − 1 to j for the ith group. Let δ̂i(j) = X̄i(j) − X̄0(j) (i = 1, 2 . . . , D) be the observed mean difference from control −1 n = σi(j) group for group i. Let ξi(j) = var X̄i(j) be the incremental information 2 i Pj from look j − 1 to look j for the ith group. Let ξij = h=1 ξi(h) . Let i−1 −1 h −1 p −1 . Let Zi(j) = δ̂i(j) Ii(j) be the = ξi(j) + ξ0(j) Ii(j) = var X̄i(j) − X̄0(j) Z statistic for the comparison p of group i versus control based on incremental data. Let Wi(j) = δ̂i(j) Ii(j) = Zi(j) Ii(j) be the score statistic based on incremental data. Let p Pj Pj Pj Wij = h=1 Wi(h) = h=1 Zi(h) Ii(h) = h=1 δ̂i(h) Ii(h) . Assume that we will Pj monitor the trial based on the processes W1j , W2j , . . . , WDj . Let Iij = h=1 Ii(h) be the cumulative information up to look j for Wij . Now let N be the total sample size for the whole study. Let ni(K1 ) ni(2) ni(1) = n0(2) = . . . = n0(K (i = 0, 1, 2, . . . , D) be the sample size allocation λi = n0(1) 1) ratio of dose i to control group. Note that as long as the allocation ratio for a particular dosepto control remains the same accross all interim looks, the Wij is the same as n 1 Zij Iij . Let 0K N = λ0 be the fraction of total sample size for control arm to total n0(j) n1(j) nD(j) sample size of the whole study. Let t(j) = n0K = n1K = . . . = nDK and let tj = n0j n0K1 = n1(j) n1K1 1 for control arm. Note that tj = Iij IiK1 1 fraction up to look j , (i = 0, 1, . . . , D). Then we have ξi(j) = Ii(j) = 2374 1 nD(j) nDK1 be the cumulative sample size Ii(j) t(j) = IiK , (i = 0, 1, . . . , D) and 1 = ... = J.1 Notations σi2 σ2 + 0 ni(h) n0(h) ni(j) σi2 −1 = σi2 + σ02 λi −1 n0(j) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Iij = j X σ2 = i + σ02 λi Ii(j) h=1 −1 X j σ2 = i + σ02 λi n0(h) h=1 −1 n0j E Wi(j) = δi Ii(j) V ar Wi(j) = Ii(j) Cov Wk(j) , Wl(j) = −1 ξ0(j) Ik(j) Il(j) = σk2 + σ02 λk σl2 + σ02 λl −1 σ02 ∗ n0(j) For the cumulative process Wij , we have E (Wij ) = E j X ! Wi(h) = δi Iij h=1 j X V ar (Wij ) = V ar ! Wi(h) = Iij h=1 Cov (Wkj , Wlj ) = j X h=1 Cov Wk(h) , Wl(h) = σk2 + σ02 λk σl2 + σ02 λl −1 σ02 ∗n0j ~ j = (W1j , W2j , . . . , WDj ) given Next we derive the conditional distribution of W 2 2 2 2 ~ j = ~xj = (x1j , x2j , . . . , xDj ) for j1 < j2 . For each process Wij , W 1 1 1 1 1 2 Pj2 Wij2 = Wij1 + h=j Wi(h) . Hence conditional on Wij1 = xij1 , Wij2 has a 1 +1 normal distribution with mean xij1 + δi (Iij2 − Iij1 ) and variance Iij2 − Iij1 . And ~ j , the covariance between Wkj and Wlj is given by conditional on W 1 2 2 j2 j2 X X ~ j = Cov Cov Wkj2 , Wlj2 | W Wk(h) , Wl(h) 1 h=j1 +1 J.1 Notations h=j1 +1 2375 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design j2 X = Cov Wk(h) , Wl(h) h=j1 +1 = = σk2 + σ02 λk σk2 + σ02 λk σl2 + σ02 λl σl2 + σ02 λl −1 j2 X σ02 ∗ n0(h) h=j1 +1 −1 σ02 ∗ (n0j2 − n0j1 ) ~ from look j1 to look j2 is given by Then the transition density of the process W 1 −D/2 p~δ,Σj |j I~j1 , ~xj1 , I~j2 ~xj2 = (2π) | Σj2 |j1 |− 2 2 1 h iT h i −1 ~ ~xj2 − ~xj1 + (Aj2 − Aj1 ) ~δ Σj2 |j1 ~xj2 − ~xj1 + (Aj2 − Aj1 ) δ exp − 2 (J.1) T T T where xj2 = (x1j1 , . . . , xDj1 ) , xj2 = (x1j2 , . . . , xDj2 ) , ~δ = (δ1 , . . . , δD ) , T I~j1 = (I1j1 , I2j1 , . . . , IDj1 ) , and the matrix Σj2 |j1 = (ζkl )D×D and Aj has the following form i−1 h 2 Ikj2 − Ikj1 = σk + σ02 (n0j2 − n0j1 ) if k = l λ 2k i−1 ζkl = h 2 σ σ 2 l λk + σ02 σ02 ∗ (n0j2 − n0j1 ) if k 6= l λl + σ0 k (1) I1j 0 Aj = 0 0 2376 J.1 Notations 0 (1) I2j 0 0 0 0 .. . 0 0 0 0 (1) IDj <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 J.2 Design J.2.1 Look 1 J.2.2 Look 2 J.2.3 Look 3 Assume that a group sequential design with K1 looks including the final look is (1) planned initially. Let ej (j = 1, 2, . . . , K1 ) be the level α exit boundaries for the (1) (1) initial design using Wij to test H0 . The boundaries ej satisfy n n o o (1) (1) 1 P ∪K > ej | ~δ = 0 = α j=1 max Wij i Let αj (j = 1, . . . , K1 ) be the cumulative type I error by look j such that o o n n (1) (1) | ~δ = 0 = αj P ∪jh=1 max Wih > eh i Let T = max {IiK1 } = n0K1 ∗ maxh h i=1,...,D 2 σh λh + σ02 i−1 . Now let Uij = Wij √ . T Then the process Uij is a Brownian process with mean ηi t̃ij and variance t̃ij where √ ηi = δi T h t̃ij = maxh σi2 λi h + σ02 2 σh λh i−1 + σ02 i−1 tj Next we derive the P conditional distribution for the process U . Note that j2 Uij2 = Uij1 + √1T h=j +1 Wi(h) . Hence, conditional on Uij1 = yij1 , Uij2 is normal Pj12 1 with mean yij1 + √T h=j1 +1 E Wi(h) = yij1 + ηi (t̃ij2 − t̃ij1 ) and variance √ T ~ j , the t̃ij2 − t̃ij1 where and ~η = (η1 , η2 , . . . , ηD ) and ηi = δi T . Conditional on U 1 covariance between Ukj2 and Ulj2 is given by Pj2 Pj2 ~j √1 Cov Ukj2 , Ulj2 | U = Cov Ukj1 + √1T h=j W , U + W lj k(h) l(h) 1 1 h=j1 +1 1 +1 T 2 −1 σl 2 +σ02 λl +σ0 2 2 −1 σ0 σ 2 h maxh λ +σ0 = 2 σk λk ∗ (tj2 − tj1 ) h J.2 Design 2377 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design Hence the transition density of the process U is given by pη~,Σ̃j 2 |j1 ((tj1 , ~yj1 ) , (tj2 ~yj2 )) = (2π) −D/2 1 | Σ̃j2 |j1 |− 2 i−1 h (~yj2 − [~yj1 + (Aj2 − Aj1 ) ~η ])T Σ̃j2 |j1 (~yj2 − [~yj1 + (Aj2 − Aj1 ) ~η ]) exp − 2 √ T T where ~η = (η1 , η2 , . . . , ηD ) and ηi = δi T , ~yj2 = (y1j2 , y2j2 , . . . , yDj2 ) and the covariance matrix Σ̃j2 |j1 = (ζkl )D×D has the form 2 −1 σk 2 λk +σ0 2 −1 (tj2 − tj1 ) t̃kj − t̃kj1 = if k = l σ 2 2 h maxh λh +σ0 2 2 −1 ζkl = σk σl 2 2 λk +σ0 λl +σ0 2 2 −1 σ0 = ∗ (tj2 − tj1 ) if k 6= l σ h +σ 2 maxh λh 0 and the matrix Aj2 has the form Aj 2 Note that t̃1j2 0 = 0 0 0 t̃2j2 0 0 0 0 .. . 0 0 0 0 t̃Dj2 n n o o (1) (1) P ∪jh=1 max Wih > eh | ~δ = 0 = αj i is equivalent to ( P ∪jh=1 (1) e max {Uih } > √h i T ) ! ~ | δ = 0 = αj e (1) For Boundary computation, we will work on the process U . Let bj = √jT be the boundary based on the process U . We can find bj recursively and the computation for boundary bj is independent of sample size. J.2.1 2378 Look 1 J.2 Design – J.2.1 Look 1 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The boundaries bj (j = 1, . . . , K1 ) satisfy the following equation n o\n o ~δ = 0 = αj − αj−1 P ∩j−1 max {U } ≤ b max {U } > b | ih j ij j h=1 i i More specifically, b1 satisfies the following equation P max {Ui1 } > b1 | ~δ = 0 = α1 i The left hand side of the above equation under any values of ~δ is Z b1 Z b1 (1) (1) (1) 1− ... pη~,Σ̃1|0 (0, 0) , t1 , ~y1 d~y1 −∞ i.e. Z b1 Z b1 ... −∞ (J.2) −∞ −∞ (1) (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 d~y1 = 1 − α1 T (1) (1) (1) (1) (1) T where ~y1 = y11 , . . . yD1 , ~η = (η1 , . . . , ηD ) and pη~,Σ̃1|0 (0, 0) , t1 , ~y1 is the joint density function of U11 , . . . , UD1 given by 1 (1) (1) −D = (2π) 2 | Σ̃1|0 |− 2 pη~,Σ̃1|0 (0, 0) , t1 , ~y1 T −1 (1) ~y1(1) − A1 ~η Σ̃1|0 ~y1 − A1 ~η exp − 2 and (1) t̃11 0 A1 = 0 0 J.2 Design – J.2.2 Look 2 0 (1) t̃21 0 0 0 0 .. . 0 0 0 0 (1) t̃D1 2379 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design J.2.2 Look 2 The boundary b2 satisfies the following equation n o\n o P max {Ui1 } ≤ b1 max {Ui2 } > b2 | ~δ = 0 = α2 − α1 i i The left hand side of the above equation under any ~δ is b1 Z Z b1 ... −∞ " b2 Z b2 Z 1− −∞ ... −∞ −∞ i.e. pη~,Σ̃2|1 b1 Z Z b1 ... −∞ "Z b2 Z b2 ... −∞ (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 −∞ −∞ (1) (1) t1 , ~y1 # (1) (1) (1) (1) , t2 , ~y2 d~y2 d~y1 (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 # (1) (1) (1) (1) (1) (1) d~y2 d~y1 pη~,Σ̃2|1 t1 , ~y1 , t2 , ~y2 = 1 − α2 (1) (1) (1) where ~y2 = y12 , . . . , yD2 T and 1 (1) (1) −D = (2π) 2 | Σ̃1|0 |− 2 pη~,Σ̃1|0 (0, 0) , t1 , ~y1 T (1) ~y1(1) − A1 ~η Σ̃−1 − A ~ η ~ y 1 1 1|0 exp − 2 pη~,Σ̃2|1 2380 (1) (1) t1 , ~y1 J.2 Design – J.2.2 Look 2 1 (1) (1) −D , t2 , ~y2 = (2π) 2 | Σ̃2|1 |− 2 (J.3) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 h iT h i (1) (1) ~y2(1) − ~y1(1) + (A2 − A1 ) ~η Σ̃−1 ~ y − ~ y + (A − A ) ~ η 2 1 2 1 2|1 exp − 2 J.2.3 Look 3 Now let’s consider the calculation of the boundary b3 which satisfies the following n o\n o\n o P max {Ui1 } ≤ b1 max {Ui2 } ≤ b2 max {Ui3 } > b3 | ~δ = 0 = α3 −α2 i i i The left hand side of the above equation under any ~δ is the following integration b1 Z Z b1 ... −∞ b2 Z −∞ Z b2 ... −∞ ( Z b3 1− Z −∞ b3 ... −∞ −∞ (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 pη~,Σ̃2|1 (1) (1) t1 , ~y1 (1) (1) , t2 , ~y2 (1) (1) (1) (1) (1) d~y3 pη~,Σ̃1|0 t2 , ~y2 , t3 , ~y3 ) (1) (1) d~y2 d~y1 (J.4) i.e. Z b1 Z b1 Z b2 ... −∞ Z b2 Z b3 ... −∞ −∞ −∞ pη~,Σ̃2|1 pη~,Σ̃1|0 (1) (1) t2 , ~y2 J.2 Design – J.2.3 Look 3 Z b3 ... −∞ (1) −∞ (1) t1 , ~y1 (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 (1) (1) , t2 , ~y2 (1) (1) (1) (1) (1) , t3 , ~y3 d~y3 d~y2 d~y1 = 1 − α3 2381 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design (1) (1) (1) (1) (1) (1) where pη~,Σ̃1|0 (0, 0) , t1 , ~y1 and pη~,Σ̃2|1 t1 , ~y1 , t2 , ~y2 are (1) (1) (1) (1) (1) defined as before. pη~,Σ̃1|0 t2 , ~y2 , t3 , ~y3 d~y3 is given as pη~,Σ̃1|0 (1) (1) t2 , ~y2 1 (1) (1) (1) −D , t3 , ~y3 d~y3 = (2π) 2 | Σ̃3|2 |− 2 h i h iT (1) (1) ~y3(1) − ~y2(1) + (A3 − A2 ) ~η Σ̃−1 ~ y − ~ y + (A − A ) ~ η 3 2 3 2 3|2 exp − 2 J.3 Conditional Power and Conditional Type I Error J.3.1 look l + 1 J.3.2 Look l + 2 Assume that the trial didn’t cross the boundaries at look 1, . . . , l. At look l, we (1) observed ~xl . The conditional rejection probability under any ~δ is given by K1 n o [ (1) (1) P max {Wij } > ej | ~xl j=l+1 (1) where ~xl (1) (1) (1) = x1l , x2l , . . . , xDl . Then the above probability is reduced to P +P i n n o (1) (1) max {Wi,l+1 } > el+1 | ~xl i o n o (1) (1) (1) max {Wi,l+1 } ≤ el+1 ∩ max {Wi,l+2 } > el+2 | ~xl + ... i i (J.5) For computational purpose, we will work on the U process which is defined as follows. h 2 i−1 σh W 2 Let T = max {IiK1 } = n0K1 ∗ maxh . Now let Uij = √Tij . Then λh + σ0 i=1,...,D the process Uij is a Brownian process with mean ηi t̃ij and variance t̃ij where √ ηi = δi T h t̃ij = maxh 2382 σi2 λi h + σ02 2 σh λh i−1 + σ02 i−1 tj J.3 Conditional Power and Conditional Type I Error <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let bj be the initial boundaries based on the process U . Then the conditional power can be calculated as follows. n o (1) P max {Ui,l+1 } > bl+1 | ~yl i +P where J.3.1 (1) ~yl n o n o (1) + ... max {Ui,l+1 } ≤ bl+1 ∩ max {Ui,l+2 } > bl+2 | ~yl i = i T √1 T (x1l , x2l , . . . , xDl ) . look l + 1 Note that the first probability is as follows n o (1) P max {Ui,l+1 } > bl+1 | ~yl i Z bl+1 =1− Z bl+1 ... −∞ ∞ pη~,Σ̃l+1|l (1) (1) (1) (1) tl , y~l (1) , tl+1 , ~yl+1 d~yl+1 where the transition density from J.1 is given by pη~,Σ̃l+1|l (1) (1) tl , ~yj1 1 (1) (1) −D/2 , tl+1 , ~yj2 = (2π) | Σ̃l+1|l |− 2 h iT h i−1 h i (1) (1) (1) (1) ~yl+1 − ~yl + (Aj2 − Aj1 ) ~η Σ̃l+1|l ~yl+1 − ~yl + (Aj2 − Aj1 ) ~η exp − 2 where the matrix Σ̃ and A are defined as in section 2. Hence under ~δ = 0, the probability n o (1) P max {Ui,l+1 } > bl+1 | ~yl i J.3 Conditional Power and Conditional Type I Error – J.3.1 look l + 1 2383 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design bl+1 Z bl+1 Z =1− ... −∞ pΣ̃l+1|l ∞ (1) (1) (1) (1) tl , y~l (1) , tl+1 , ~yl+1 d~yl+1 where the transition density is given by pΣ̃l+1|l (1) (1) tl , ~yj1 1 (1) (1) −D/2 , tl+1 , ~yj2 = (2π) | Σ̃l+1|l |− 2 T h i−1 (1) (1) (1) (1) ~yl+1 − ~yl Σ̃l+1|l ~yl+1 − ~yl exp − 2 J.3.2 Look l + 2 The second term in J.5 under any ~δ is n o n o (1) (1) (1) P max {Wi,l+1 } ≤ el+1 ∩ max {Wi,l+2 } > el+2 | ~xl i i which can be expressed in term of the process U n o n o (1) P max {Ui,l+1 } ≤ bl+1 ∩ max {Ui,l+2 } > bl+2 | ~yl i Z i bl+1 = Z −∞ " Z bl+2 1− bl+1 ... Z −∞ bl+2 ... −∞ −∞ where pη~,Σ̃l+1|l pη~,Σ̃l+1|l pη~,Σ̃l+2|l+1 (1) (1) tl , ~yl (1) (1) tl , ~yl (1) (1) tl+1 , ~yl+1 (1) (1) , tl+1 , ~yl+1 # (1) (1) (1) (1) , ~tl+2 , ~yl+2 d~yl+2 d~yl+1 1 (1) (1) −D , tl+1 , ~yl+1 = (2π) 2 | Σ̃l+1|l |− 2 h iT h i (1) (1) (1) (1) ~yl+1 − ~yl + (Al+1 − Al ) ~η Σ̃−1 ~ y − ~ y + (A − A ) ~ η l+1 l l+1 l l+1|l exp − 2 pη~,Σ̃l+2|l+1 2384 1 (1) (1) (1) (1) −D tl+1 , ~yl+1 , ~tl+2 , ~yl+2 = (2π) 2 | Σ̃l+2|l+1 |− 2 J.3 Conditional Power and Conditional Type I Error – J.3.2 Look l + 2 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 h iT h i (1) (1) (1) (1) ~yl+2 − ~yl+1 + (Al+2 − Al+1 ) ~η Σ̃−1 yl+2 − ~yl+1 + (Al+2 − Al+1 ) ~η l+2|l+1 ~ exp − 2 Under ~δ = 0, we will obtain the conditional type I error by replacing ~η by 0. J.4 Compute power and sample size J.4.1 Compute power for user-specified sample size J.4.1 Compute power for user-specified sample size To compute power for user-specified sample size, we first need to compute boundaries bj (j = 1, . . . , K1 ) using the method in Section 2. Once the boundaries have been computed, we can compute power for user-specified sample size. The power is given by n n o o (1) (1) 1 P ∪K max W > e | ~δ j=1 ij j i n o (1) (1) = P max Wi1 > e1 | ~δ i +P n n o o\n n o o (1) (1) (1) (1) max Wi1 ≤ b1 max Wi2 > b2 | ~δ + . . . i i Let N be the total sample size for the study. Assume we want to power the study at some ~δ = (δ1 , δ2 , . . . , δD ) . To compute the power for a sample size of N , we work W (1) with the process U which is defined as Uij = √ijT and h i−1 2 σh 2 + σ T = max {IhK1 } = n0K1 ∗ maxh 0 λh h=1,...,D From Section 2, we have n o (1) (1) P max Wi1 > e1 | ~δ = P max {Ui1 } > b1 | ~δ i i Z b1 1− Z b1 ... −∞ −∞ (1) (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 d~y1 J.4 Compute power and sample size – J.4.1 Compute power for user-specified sample size2385 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design where 1 (1) (1) −D pη~,Σ̃1|0 (0, 0) , t1 , ~y1 = (2π) 2 | Σ̃1|0 |− 2 −1 T (1) ~y1(1) − A1 ~η Σ̃1|0 ~y1 − A1 ~η exp − 2 P n n o o n o o\n (1) (1) (1) (1) | ~δ max Wi2 > e2 max Wi1 ≤ e1 i i =P n o\n o max {Ui1 } ≤ b1 max {Ui2 } > b2 | ~δ i i b1 Z = Z −∞ " Z b2 Z 1− b1 ... b2 ... −∞ −∞ −∞ pη~,Σ̃2|1 (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 (1) (1) t1 , ~y1 # (1) (1) (1) (1) , t2 , ~y2 d~y2 d~y1 T (1) (1) (1) where ~y2 = y12 , . . . , yD2 and 1 (1) (1) −D = (2π) 2 | Σ̃1|0 |− 2 pη~,Σ̃1|0 (0, 0) , t1 , ~y1 T (1) ~y1(1) − A1 ~η Σ̃−1 − A ~ η ~ y 1 1 1|0 exp − 2 pη~,Σ̃2|1 (1) (1) t1 , ~y1 1 (1) (1) −D , t2 , ~y2 = (2π) 2 | Σ̃2|1 |− 2 h i iT h (1) (1) −1 ~y2(1) − ~y1(1) + (A2 − A1 ) ~η Σ̃2|1 ~y2 − ~y1 + (A2 − A1 ) ~η exp − 2 Similarly, 2386 J.4 Compute power and sample size – J.4.1 Compute power for user-specified sample size <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 2 n n o o\n n o o \ (1) (1) (1) (1) P max Wi,j ≤ ej max Wi,3 > e3 | ~δ i j=1 i 2 n o\n o \ =P max {Ui,j } ≤ bj max {Ui,3 } > b3 | ~δ i j=1 Z i b1 Z = −∞ Z b2 −∞ Z −∞ Z b3 1− Z −∞ pη~,Σ̃2|1 b3 ... −∞ −∞ (1) (1) pη~,Σ̃1|0 (0, 0) , t1 , ~y1 b2 ... ( b1 ... pη~,Σ̃3|2 (1) (1) t1 , ~y1 (1) (1) t2 , ~y2 (1) (1) , t2 , ~y2 (1) (1) (1) , t3 , ~y3 d~y3 ) (1) (1) d~y2 d~y1 To compute sample size needed to achieve a target power specified by users, we will need to use bisection search algorithm to find the required sample size for a target power. J.5 Simulation (1) Let nij be the cumulative sample size up to look j for each group for the primary trial. Let L1 be the look number at which dose selection occurs. For α1 < α2 < . . . < αK1 = α be the cumulative α spent by each interim look, let (1) ej (j = 1, 2, . . . , K1 ) be the exiting boundaries for the process (1) Wi,j (i = 1, . . . , D; j = 1, . . . , K1 ) such that n n o o (1) (1) P ∪jh=1 max Wi,h > eh | ~δ = 0 = αj i We first generate the incremental data for each dose group and control group at each (1) look. Then we calculate Wi,j (i = 1, . . . , D; j = 1, . . . , L1 ). If there exists such a n o (1) (1) j ≤ L1 such that maxi Wi,j > ej , then stop the trial. If the trial doesn’t cross the (1) (1) boundaries until look L1 and suppose we observed WiL1 = xi,L1 (i = 1, . . . , D), we J.5 Simulation 2387 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design (1) will drop those doses with xiL1 < γ . Let S ⊆ F ≡ {1, 2, . . . D} be the index set of 0 the doses selected and denote the cardiality of S by D∗ . Let α be the conditional type I error at look L1 based on the initial design. Suppose K2 additional looks will be planned after dose selection. Let (2) nij (1 ≤ j ≤ K2 ) be the cumulative sample size by look j after dose selection. Suppose that the remaining sample size for the dropped doses are reallocated to other dose groups according to the same allocation ratios. Then nij Suppose user specifies 0 0 0 0 0 how to spend this conditional type I error α with α1 < α2 . . . < αK2 = α . Next we (2) (2) (2) need to compute the new boundaries after dose selection. Let e1 , e2 , . . . , eK2 be the (2) (2) (2) exiting boundaries after dose selection. Then e1 , e2 , . . . , eK2 satisfy the following equation K2 n o [ 0 (2) (2) (1) P max Wij > ej | xiL1 (i ∈ S) ; ~δ = 0 = α (J.6) j=1 i∈S Suppose sample size adaptation is planned at look L2 with 0 ≤ L2 < K2 . If L2 = 0, then dose selection and sample size adaptation will be performed at the same look. We will generate the incremental statistics at each look j (0 < j ≤ L2 ) (Note that if (2) L2 = 0, skip this step). If there exists a j such that the boundary ej is crossed, then the trial stops. If the trial didn’t cross any of the boundaries up to look L2 , we will perform sample size adaptation. We will increase the total sample size by a flat rate, say 50%, for each n of theoselected doses and control group if (2) νlow < maxi∈S WiL2 < νup , otherwise keep the total sample size for each selected dose and control group as planned. Suppose K3 additional looks will be performed after sample size adaptation. Let (3) nij (1 ≤ j ≤ K3 ) be the cumulative sample size by look j after sample size adaptation. If the sample size is adapted, we will need to recalculate the boundaries 00 such that the conditional type I error at look L2 , denoted by α is preserved. Suppose (2) (2) we observed WiL2 = xiL2 at look L2 . Suppose user specifies how to spend this 00 00 00 00 00 conditional type I error α with α1 < α2 . . . < αK3 = α . Next we need to compute (3) (3) (3) the new boundaries after sample size adaptation. Let e1 , e2 , . . . , eK3 be the exiting 2388 J.5 Simulation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (3) (3) boundaries after adaptations. We need to find e1 , . . . , eK3 such that P K3 [ j=1 (2) n o 00 (3) (3) (2) ~ max Wij > ej | xiL2 ; δ = 0 = α (J.7) i∈S (2) (2) The new boundary e1 , e2 , . . . , eK2 after dose selection and the new boundary (3) (3) (3) e1 , e2 , . . . , eK2 after sample size adaptation can be calculated by recursively solving the equation J.6 and J.7. The technical details for new boundary calculation is described in Section 5.1 and 5.2. J.5.1 Compute boundaries after dose selection at look L1 (2) (2) (2) To compute the boundaries e1 , e2 , . . . , eK2 , we first need to compute the 0 conditional type I error α . The conditional type I error for the primary trial at look L1 is the following probability. K1 n o [ (1) (1) (1) P max Wij > ej | xiL1 ; ~δ = 0 (J.8) j=L1 +1 i∈F To compute J.8, we need to work with the process U (1) which as h is defined n o i−1 (1) 2 Wij (1) (1) (1) (1) σh (1) 2 . Let bj be Uij = √ (1) and T = max IiK1 = n0K1 ∗ maxh λh + σ0 T i∈F the boundaries based on the process U (1) . Then the equation J.8 is equivalent to K1 n o [ 0 (1) (1) (1) (1) P max Uij > bj | UiL1 = yiL1 ; ~δ = 0 = α j=L1 +1 (1) x i∈F (1) 1 where yiL1 = √ iL(1) . The above probability can be obtained by recursively computing T the following probability for all K1 ≥ j > L1 and then adding up these probabilities for all j with K1 ≥ j > L1 ! \ j−1 n o n o \ (1) (1) (1) (1) (1) ~ P max U ≤b max U >b | y ;δ = 0 h=L1 +1 i∈F ih h i∈F ij j iL1 J.5 Simulation – J.5.1 Compute boundaries after dose selection at look L1 2389 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design 0 (2) (2) (2) Once we obtain α , we can calculate the new boundary e1 , e2 , . . . , eK2 which satisfy the following equation. P n o 0 (2) (2) (1) max Wij > ej | xiL1 (i ∈ S) = α K2 [ (J.9) i∈S j=1 To compute the probability sitting to the left hand side of the equation J.9, we define W (2) (2) the process U (2) as Uij = √ ij(2) and T h o n i−1 2 (2) (2) σh (2) 2 T = max IiK2 = n0K2 ∗ maxh∈F . If there is no sample size λh + σ0 i∈F (2) (1) reallocation after dropping doses, then n0K2 = n0K1 . If there is sample size reallocation after dropping doses and assume that we keep the same allocation ratios for the selected doses to control arm, then the sample size allocated to control arm after P P (1) (1) N −n0L (1+ i∈F λi ) N −n0L (1+ i∈F λi ) (2) (1) 1P 1P and n = + n0L1 and look L1 is 0K2 λi λi 1+ 1+ i∈S (2) niK2 (2) λi n0K2 . i∈S (2) Ui0 (1) x √i,L1 T (2) (2) yi0 = Let = = where we use subscript 0 to indicate the starting state for the secondary trial. We first compute the boundaries (2) (2) (2) (2) b1 , b2 , . . . , bK2 where bj P K2 [ (2) δi (2) (2) h=1 Ii(h) √1 T (2) Pj √ such that i∈S Note that Uij = yi0 + mean yi0 + ej √ (2) T n o 0 (2) (2) (2) (2) max Uij > bj | Ui0 = yi0 (i ∈ S) = α j=1 (2) (2) = T (2) (2) Pj h=1 (2) (2) (2) (2) Ii(j) 2390 σ2 = i + σ02 λi (2) Wi(h) . Hence Uij is normal distributed with (2) h=1 Ii(h) (2) T Pj = yi0 + ηi ηi (J.10) = δi −1 p (2) h=1 Ii(h) (2) T Pj and variance T (2) (2) n0(j) where j = 1, . . . , K2 J.5 Simulation – J.5.1 Compute boundaries after dose selection at look L1 where <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (2) (2) The covariance between Ukj and Ulj is (2) (2) (2) Cov Ukj , Ulj | yi0 = = Pj (2) Pj (2) 1 Cov h=1 Wk(h) , h=1 Wl(h) T (2) 2 2 −1 σk σl (2) (2) 2 2 σ02 ∗ n0j −n00 λk +σ0 λl +σ0 2 −1 σ (2) 2 h maxh∈F n0K λ +σ0 2 h (2) (1) (2) where n00 = n0L1 . We can find bj recursively by solving the following equation P j−1 \ h=1 ! \ n o n o 0 (2) (2) (2) (2) (2) max Uih ≤ bh max Uij > bh | yi0 (i ∈ S) = αj i∈S i∈S (2) (2) (2) Once we obtain bj , we can compute ej as ej J.5.2 (2) √ = bj T (2) . Compute boundaries after sample size adaptation at look L2 (3) (3) (3) To compute the boundaries e1 , e2 , . . . , eK2 , we first need to compute the 00 conditional type I error α at look L2 of the secondary trial. The conditional type I error for the secondary trial at look L2 is the following probability. K2 o n [ (2) (2) (2) > ej | xiL2 (i ∈ S); ~δ = 0 (J.11) P max Wij j=L2 +1 i∈S (2) W (2) To compute J.11, we need to work with the process Uij = √ ij(2) and h i−1 T n o 2 (2) (2) σh (2) 2 T = max IiK2 = n0K2 ∗ maxh∈F . By this step, the λh + σ0 i∈F (2) (2) (2) boundaries b1 , b2 , . . . , bK2 have been computed. Then J.11 is equivalent to P K2 [ j=L2 +1 (2) x n o (2) (2) (2) ~ max Uij > bj | yiL2 ; δ = 0 i∈F (2) 1 where yiL2 = √ iL(2) . The above probability can be obtained by recursively computing T the following probability for all K1 ≥ j > L1 and then adding up these probabilities J.5 Simulation – J.5.2 Compute boundaries after sample size adaptation at look L2 2391 <<< Contents * Index >>> J Theory-Multi-arm Multi-stage Group Sequential Design for all j with K2 ≥ j > L2 j−1 \ P ! \ n o n o (2) (2) (2) (2) (2) ~ max Uih ≤ bh max Uij > bj | yiL2 ; δ = 0 i∈F h=L2 +1 i∈F 00 (3) (3) (3) Once we obtain α , we can calculate the new boundary e1 , e2 , . . . , eK3 which satisfy the following equation. K3 n o [ 00 (3) (3) (2) P max Wij > ej | xi,L2 (i ∈ S) = α (J.12) i∈S j=1 To compute the probability sitting to the left hand side of the equation J.12, we define W (3) (3) the process U (3) as Uij = √ ij(3) and T h n o i−1 2 (3) (3) (3) (3) σh 2 (3) T = max IiK2 = n0K2 ∗ maxh∈F . Let Ui0 = yi0 = λh + σ0 i∈F (3) (3) (3) (3) We first compute the boundaries b1 , b2 , . . . , bK3 where bj P K3 [ j=1 (3) (3) Note that Uij = yi0 + (3) mean yi0 + δi Pj (3) = e √j T (3) such that n o 00 (3) (3) (3) max Uij > bj | yi0 (i ∈ S) = α i∈S √1 T (3) (3) I √h=1 i(h) T (3) (3) Pj h=1 (3) (3) (3) h=1 Ii(h) T (3) Pj = yi0 + ηi (3) (3) Ii(j) = (3) (3) Wi(h) . Hence Uij is normally distributed with ηi = δi p (3) h=1 Ii(h) T (3) Pj and variance where T (3) σi2 + σ02 λi −1 (3) n0(j) (3) The covariance between Ukj and Ulj is P (3) (3) (3) (3) Pj (3) j Cov Ukj , Ulj | yi0 = T 1(3) Cov W , W h=1 h=1 k(h) l(h) = 2 σk λk +σ02 σl2 λl maxh∈F 2392 (2) x √ iL2 . T (3) −1 (3) (3) σ02 ∗ n0j −n00 −1 σ (3) 2 h n0K λ +σ0 +σ02 2 h 3 J.5 Simulation – J.5.2 Compute boundaries after sample size adaptation at look L2 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (3) (2) (3) where n00 = n0L2 . We can find bj recursively by solving the following equation P j−1 \ h=1 \ n o n o (3) (3) (3) (3) (3) max Uih ≤ bh max Uij | yi0 (i ∈ S) > bh i∈S i∈S (3) (3) (3) Once we obtain bj , we can compute ej as ej J.5 Simulation (3) = bj √ ! 00 = αj T (3) . 2393 <<< Contents * Index >>> K Theory - MultiArm Two Stage Designs Combining p-values K.1 Introduction This appendix describes theory behind two stage multi arm designs which combine p-values. These designs are available for difference of means test and difference of proportions test. We also provide details of various computations used in East for these designs. K.2 Treatment Effect Scales East provides various treatment effect scales for selecting treatments for stage 2 for difference of means as well as difference of proportions tests. This section describes these treatment effect scales. Treatment effect scale is used along with treatment selection rule for selecting treatments for stage 2. K.2.1 Estimated Mean K.2.2 Estimated Delta K.2.3 Estimated Standardized Effect Size K.2.4 Test Statistic K.2.5 Conditional Power K.2.6 Isotonic Mean K.2.7 Isotonic Delta K.2.8 Isotonic Standardized Effect Size K.2.9 Estimated Proportion K.2.10 Isotonic Proportion For any treatment effect scale, if a tie or ties are observed then they are broken using following conventions. 1. If responses are generated using dose response curve then the treatment with the lowest dose among tied treatments is selected. 2. If responses are not generated using dose response curve then treatment is selected at random among tied treatments. For isotonic computations ’Pooled Adjacent Violators Algorithm’ (PAVA) proposed by Ayers et. al. (1955) is used. K.2.1 Estimated Mean This treatment effect scale is available only for difference of mean test. The estimated mean response for each treatment is used in this treatment effect scale. K.2.2 Estimated Delta This treatment effect scale is available for difference of mean as well as difference of proportions test. For difference of mean test, the estimated δ which is the difference between estimated mean for a particular treatment and estimated mean for control is used in this treatment effect scale. For difference of proportions test, the estimated δ which is the difference between estimated proportion for a particular treatment and estimated proportion for control is used in this treatment effect scale. 2394 K.2 Treatment Effect Scales – K.2.3 Estimated Standardized Effect Size <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 K.2.3 Estimated Standardized Effect Size This treatment effect scale is available only for difference of means test. It is available only if variance option is equal for t statistic and common standard deviation option is selected for Z statistic. The estimated δ for each treatment is difference between estimated mean for a particular treatment and estimated mean for control. If test statistic option is Z then common standard deviation provided by user is used in the computations. If test statistic is t then estimated pooled standard deviation across all data is used in the computations. K.2.4 Test Statistic This treatment effect scale is available for difference of mean as well as difference of proportions test. For difference of means test, test statistic (Z or t) considering variance option (equal or unequal) is used for this treatment effect scale. For difference of proportions test, test statistic Z considering pooled or un-pooled option is used for this treatment effect scale. K.2.5 Conditional Power This treatment effect scale is available for difference of mean as well as difference of proportions test. The computation of conditional power is done under the assumption that only control and specified treatment are carried forward to stage 2. The details of computation of conditional power for each specific treatment as given below. w(1) : Weight for stage 1 Z (1) : Incremental statistic for stage 1 RB: Cumulative efficacy boundary on Z scale for stage 2 for right tailed test LB: Cumulative efficacy boundary on Z scale for stage 2 for left tailed test p: Raw p value for stage 1 q: Raw p value for stage 2 SN : Standard Normal random variable (2) nt : Sample size corresponding to the specified treatment in stage 2. (2) nc : Sample size corresponding to the control in Stage 2. λ: Allocation ratio for specified treatment as specified in initial allocation. K.2 Treatment Effect Scales – K.2.5 Conditional Power 2395 <<< Contents K * Index >>> Theory - MultiArm Two Stage Designs Combining p-values (2) (2) nt and nc are computed using stage 2 sample size as planned initially and allocation ratio under the assumption that only specified treatment and and control are carried to stage 2. Right Tailed Test CP = P (SN > RC − B) Where RC = RB − w(1) Φ(−1) (1 − p) w(2) (K.1) (K.2) Left Tailed Test CP = P (SN < LC − B) (K.3) LB − w(1) Φ(−1) (p) w(2) (K.4) Where LC = For Difference of Means Test δ̂ D B= (K.5) Where, δ̂ = d¯t (1) − d¯c (1) If Variance option is equal then s D=σ 1 (2) nt + 1 (2) If Test statistic option is t then σ: Estimate of Pooled standard deviation for stage 1 If Test statistic option is Z then σ: Design common standard deviation If Variance option is un-equal then s σt2 σc2 D= + (2) (2) nt nc If test statistic option is t then σt2 : Estimate of variance for specified treatment based on stage 1 2396 (K.6) nc K.2 Treatment Effect Scales – K.2.5 Conditional Power (K.7) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 σc2 : Estimate of variance for control based on stage 1 If test statistic option is Z then σt2 : Design variance for specified treatment σc2 : Design variance for control For Difference of Proportions Test B= δ̂ D (K.8) Where δ̂ = πˆt − πˆc (K.9) Where πˆt : Estimate of Proportion for treatment based on stage 1 πˆc : Estimate of Proportion for control based on stage 1 When variance is Un-Pooled s πˆt (1 − πˆt ) πˆc (1 − πˆc ) D= + (2) (2) nt nc (K.10) When variance is Pooled v u u D = tπ̄ (1 − π̄) 1 (2) nt + 1 ! (2) (K.11) nc Where π̄: Estimate of pooled proportion based on stage 1 K.2.6 Isotonic Mean This treatment effect scale is available only for difference of mean test. Isotonic means are computed after applying PAVA algorithm to estimated means of all treatments and control. K.2.7 Isotonic Delta This treatment effect scale is available for difference of mean as well as for difference of proportions test. First Isotonic means are computed by applying PAVA algorithm to estimated means of all treatments and control. Using these computed isotonic means, the value of isotonic δ for each treatment is computed. K.2 Treatment Effect Scales – K.2.8 Isotonic Standardized Effect Size 2397 <<< Contents K * Index >>> Theory - MultiArm Two Stage Designs Combining p-values K.2.8 Isotonic Standardized Effect Size This treatment effect scale is available for difference of mean test only. It is available only if variance option is equal for t statistic and common standard deviation option is selected for Z statistic. Isotonic σδ values are computed by first computing isotonic δ values for all treatments. If test statistic option is Z then value of σ used is the value of common standard deviation and if test statistic is t then estimated pooled standard deviation across all data is used in the computations. K.2.9 Estimated Proportion This treatment effect scale is available only for difference of proportions test. The estimated proportion for each treatment is used in this treatment effect scale. K.2.10 Isotonic Proportion This treatment effect scale is available only for difference of proportions test. Isotonic proportions are computed after applying PAVA algorithm to estimated proportions of all treatments and control. K.3 Combination Method East uses ”Inverse Normal” combination function to combine p values (or adjusted p values) from two stages. Default values of weights for two stages are computed as follows. r n(1) (1) w = n r w(2) = n(2) n (K.12) (K.13) Where n(1) and n(2) are the total sample sizes corresponding to stage 1 and stage 2 respectively and n is the total sample size. East allows the user to change the weights as long as the weights satisfy the following condition w(1) ∗ w(1) + w(2) ∗ w(2) = 1 (K.14) 2398 K.3 Combination Method <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 East uses ”Inverse Normal” combination function to combine p values (or adjusted p values) from two stages. Let p(1) and p(2) be p-values (or adjusted p-values) from stage 1 and stage 2 respectively. The combined p value is given by the formula p = 1 − Φ w(1) Φ−1 1 − p(1) + w(2) Φ−1 1 − p(2) K.4 Closed Testing (K.15) No elementary hypothesis can be rejected unless all intersection hypotheses which contain that elementary hypothesis are rejected is the closed testing principle of Marcus et. al. (1976). This principle is applied in analysis after both stages. For multiplicity adjustment, East provides Bonferroni, Sidak, Simes and Dunnett methods. Dunnett method is available only for difference of means test. For details of these methods please see appendix of multiple comparison procedures. After stage 1, multiplicity adjusted p-values are computed for each intersection hypothesis and then closed testing is used to perform hypothesis test of each individual hypothesis. After stage 2, multiplicity adjusted p values from both stages are combined for each intersection hypothesis using combination method and then closed testing is used to perform hypothesis test of each individual hypothesis. K.5 Stopping Boundaries East allows stopping after stage 1 using efficacy or futility boundaries. Trial is terminated after stage 1 if any of the treatment arms crosses efficacy boundary. Trial is terminated for futility after stage 1 if all treatment arms cross futility boundary. At the end of stage 2, for efficacy futility design, if no arm has crossed efficacy boundary then trial is declared futile. For efficacy as well as futility stopping, adjusted p value obtained using combination and closed testing procedures is used. For futility stopping, user can specify futility boundary in terms of δ for difference of proportions test and in terms of σδ for difference of means test. K.5 Stopping Boundaries 2399 <<< Contents K K.6 * Index >>> Theory - MultiArm Two Stage Designs Combining p-values Sample Size Reestimation Sample size re-estimation allows the user to increase sample size after stage 1. User can specify a target conditional power which will be used to compute re-estimated sample size. User may also directly specify re-estimated sample size. Sample size reduction is not allowed. Promising zone approach used in adaptive simulations in East is also used here. If a trial lands in the promising zone then only sample size is re-estimated and used. If a trial lands in unfavorable to favorable zones then sample size is not re-estimated. The conditional power calculation will be based on the assumption that only the control treatment and the best treatment (according to the treatment effect scale) are carried to the second stage. Z (1) : Incremental statistic for stage 1 corresponding to best treatment. (2) nc : Sample size corresponding to control in stage 2. (2) nt : Sample size corresponding to the best treatment in Stage 2. p: Raw p-value for the best treatment at stage 1. (2) λb : Allocation ratio for the best treatment as specified in treatment selection tab RBA: Adjusted Cumulative efficacy boundary on Z scale for stage 2 for right tailed test. Adjusted using αk where α is the design type I error and k is the number of active treatments in stage 1. LBA: Adjusted Cumulative efficacy boundary on Z scale for stage 2 for left tailed test. Adjusted using αk where α is the design type I error and k is the number of active treatments in stage 1. tCP : Target conditional power SN : Standard Normal random variable For right tailed test, the formula for conditional power is given by CP = P (SN > RC − B) = tCP (K.16) RBA − w(1) Φ(−1) (1 − p) w(2) (K.17) Where RC = For Left tailed test, the formula for conditional power is given by CP = P (SN < LC − B) = tCP Where LC = 2400 K.6 Sample Size Re-estimation LBA − w(1) Φ(−1) (p) w(2) (K.18) (K.19) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 For Difference of Means Test δ̂ D B= (K.20) Where, δ̂ = d¯t (1) − d¯c (1) If Variance option is equal then s D=σ 1 + (2) nt Let us define s D1 = σ 1 (2) (K.21) nc 1 (2) +1 (K.22) λb If Test statistic option is t then σ: Estimate of Pooled standard deviation for stage 1 If Test statistic option is Z then σ: Design common standard deviation If Variance option is un-equal then s D= Let us define σt2 (2) nt s D1 = + σt2 (2) λb σc2 (2) (K.23) + σc2 (K.24) nc If test statistic option is t then σt2 : Estimate of variance for specified treatment based on stage 1 σc2 : Estimate of variance for control based on stage 1 If test statistic option is Z then σt2 : Design variance for specified treatment σc2 : Design variance for control K.6 Sample Size Re-estimation 2401 <<< Contents K * Index >>> Theory - MultiArm Two Stage Designs Combining p-values For Difference of Proportions Test B= δ̂ D (K.25) Where δ̂ = πˆt − πˆc (K.26) Where πˆt : Estimate of Proportion for treatment based on stage 1 πˆc : Estimate of Proportion for control based on stage 1 When variance is Un-Pooled s πˆt (1 − πˆt ) πˆc (1 − πˆc ) D= + (2) (2) nt nc Let us define s D1 = πˆt (1 − πˆt ) (2) λb + πˆc (1 − πˆc ) (K.27) (K.28) When variance is Pooled v u u D = tπ̄ (1 − π̄) Let us define 1 (2) + nt v u u D = tπ̄ (1 − π̄) 1 (2) 1 ! (2) (K.29) nc ! +1 (K.30) λb Where π̄: Estimate of pooled proportion based on stage 1 Finally the formulae for sample size on control arm on stage 2 are as follows. For Right Tailed Test 2 D 2 −1 (K.31) n(2) (1 − tCP ) ∗ 1 c = RC − Φ δ̂ 2 For Left Tailed Test 2 D 2 −1 n(2) (tCP ) ∗ 1 (K.32) c = LC − Φ δ̂ 2 (2) Once the re-estimated control sample size nc is computed then we will consider the allocation ratio specified in the treatment selection tab (for second stage) and compute the sample size for a specific treatment which is carried forward to stage two. 2402 K.6 Sample Size Re-estimation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Sample size re-estimation is not performed if estimated delta value after stage 1 is in opposite direction of the rejection type of the test. K.6 Sample Size Re-estimation 2403 <<< Contents * Index >>> L Technical Details - Predicted Interval Plots Predicted interval plots (PIP) are useful tool in accessing magnitude of future treatment effect and its associated uncertainty given current data. Predicted interval plots are available in regular interim monitoring as well as Muller and Schafer interim monitoring. In this appendix, we describe technical details related to PIP. We have divided the appendix in following four sections. 1. 2. 3. 4. L.1 Inputs for PIP Inputs for PIP Estimation of Parameters from Data Simulating the future for PIP Computing and Displaying PIP Below we describe details about inputs required for PIP. 1. PIP for Look - This corresponds to the various choices about the future that you want generated in PIP. There are three choices. (a) PIP at Final Look - This option corresponds to the final look in Design. With this option, it is assumed that there is only one specified look in the future and future will be generated so that completers or events corresponding to the final look as per design are achieved. (b) PIP at Any Future Look - All looks specified in design which have not yet happened in Interim Monitoring Sheet are considered in this option. In the future looks, early stopping is also considered with this option. (c) PIP at User specified Look - With this option, it is assumed that there is only one specified look in the future but here user can alter completers or events which correspond to this future look. This option is not available in PIP for Muller and Schafer interim monitoring. 2. Population ID - The variable corresponding to the Population ID must be binary which contains only two distinct values. In user interface, you can specify which value corresponds to the control arm and which value corresponds to the treatment arm. 3. Arrival Time - The variable corresponding to Arrival Time is required only for Survival end point tests. This variable must be numeric (values strictly greater than zero) representing time of entry into the trial for each subject. 4. Status Indicator - The variable corresponding to Status Indicator is mandatory for Survival end point trials. It is required for normal or binomial end point trials if data contains delayed responses. In this variable, value of 1 represents that response has been observed for that subject. Value of -1 represents that the subject has dropped out before giving response. Value 0 represents that the subject is still in the trial but has not responded. 2404 L.1 Inputs for PIP <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 5. Response Variable - The variable which corresponds to the Response Variable must be numeric for Normal End Point test. It must be binary which contains only two distinct values for discrete end point test. In user interface, you can specify which value corresponds to the control arm and which value corresponds to the treatment Arm. For Survival End Point tests, this variable must be continuous representing time spent in the trial until event (for subject whose Status Indicator is 1) or until drop out (for subject whose Status Indicator is -1) or total follow up time (for pending subject whose Status Indicator is 0). L.2 Estimation of Parameters from Data Estimation of Parameter from current Data is optional. By default design values of parameters are copied whenever possible. Of course after estimating parameters from current data, you can edit their values as you desire. Before estimating parameters from data, invalid observations are ignored from the data. Sample size is calculated as total number of subjects accrued in the trial, based on the current data. Number of completers or events (if they are different than Sample size because of response lag or drop out probability) are computed as subjects whose response is observed. Sample Size and Number of Completers are not editable for regular PIP. Sample size is editable for MS PIP. You should verify that number of completers or events computed from the data match with the number used in Interim Monitoring Sheet. In case of Normal or Binomial end point, if data does not contain delayed responses then all subjects in the data are assumed to be responders and their responses are used for parameter estimation. If the data contains delayed responses then response for subjects who have Status Indicator value equal to 1 are used for parameter estimation. In case of Survival end point, Data Base Lock Time (DBLT) is computed from that data which is the maximum observed calendar time in the data. For subjects whose Status Indicator is 0 i.e. for pending subjects, the Response value must be the difference between DBLT and arrival time for that subject. If this condition is not met for any subject then Response value will be correctly updated for such a subject and used in parameter estimation. Here are the formulae for estimation of various parameters. 1. Difference of Means Test Let nc : Number of responders on control arm. nt : Number of responders on treatment arm. xi,c : Response value of ith subject on control arm. xi,t : Response value of ith subject on treatment arm. Estimate of Mean Control is given by Pnc xi,c µc = i=1 nc L.2 Estimation of Parameters from Data (L.1) 2405 <<< Contents L * Index >>> Technical Details - Predicted Interval Plots Estimate of Mean Treatment is given by Pnt xi,t µt = i=1 nt (L.2) Estimate of Difference of Means is given by δ = µt − µc (L.3) Estimate of Probability of Dropout is given by PD = No. of DropOuts Sample Size (L.4) Estimate of Std. Deviation is given by pooled standard deviation computed from the data. 2. Difference of Proportions and Ratio of Proportions Test Let xc : Number of responses on control arm yt : Number of responses on treatment arm nc : Number of subjects on control arm nt : Number of subjects on treatment arm Estimate of Proportion under control is given by πc = (xc + 0.5) (nc + 1) (L.5) Estimate of Proportion under treatment is given by πt = (xt + 0.5) (nt + 1) (L.6) Estimate of Difference of Proportions is given by δ = πt − πc (L.7) Estimate of Probability of Dropout is given by PD = 2406 No. of DropOuts Sample Size L.2 Estimation of Parameters from Data (L.8) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 3. Survival (GADAR and GADSD) Tests Let Ec : Number of Events on Control Arm. Et : Number of Events on Treatment Arm. Dc : Number of Drop outs on Control Arm. Dt : Number of Drop outs on Treatment Arm . F Tc : Total follow up time of all patients on Control Arm. F Tt : Total follow up time of all patients on Treatment Arm. Estimate of Hazard Rate for Control is given by Ec (L.9) F Tc Estimate of Hazard Ratio (HR) is computed from the Cox proportional hazard model. Estimate of Hazard Rate for Treatment is given by λc = λt = λc ∗ HR (L.10) Estimate of Drop out Hazard Rate for Control is given by Dc F Tc Estimate of Drop out Hazard Rate for Treatment is given by γc = γt = L.3 Dt F Tt (L.11) (L.12) Simulating the future for PIP For simulating the future for PIP, parameters estimated from data (editable) are used. Other values of parameters required for simulation (like allocation ratio for example) are used from the corresponding design. For Normal and Binary end point, response value is generated for pending subjects but treatment indicator for them is preserved from current data. Treatment indicator and response values both are generated for future subjects. For Survival end point also for pending subjects Treatment Indicator is preserved as is in the current data. For generating survival and dropout times for pending subjects, memory less property of exponential distribution is used. Generation of arrival times for future subjects starts after Data Base Lock Time. Generation of survival and drop out times for future subjects is similar to enhanced simulation. L.3 Simulating the future for PIP 2407 <<< Contents * Index >>> L Technical Details - Predicted Interval Plots L.4 Computing and Displaying PIP The current data is fixed in all simulations but generated future data varies across simulations. For each simulation, the cumulative estimate of Delta (HR for survival end point) and associated standard error is computed. For PIP in non-adaptive IM as well as PIP in MS IM, futility boundary if present in design is ignored in all simulation computations. For regular PIP, two sided confidence interval is always computed even for one sided trial design. Boundary computation at a particular future look or any future look is similar to the boundary computation performed in Interim Monitoring sheet. Repeated confidence interval (RCI) is computed for each simulation with computations similar to that of Interim Monitoring sheet. For MS PIP, one sided RCI is always computed using shift method. For Efficacy Two Sided and for Efficacy Futility Two Sided designs with type I error α, the confidence coefficient that is used in regular PIP is 100 × (1 − α) % For Efficacy One Sided and for Efficacy Futility one sided designs with type I error α, the confidence coefficient that is used in regular PIP is 100 × (1 − 2 × α) % For Efficacy One Sided and for Efficacy Futility one sided designs with type I error α, the confidence coefficient that is used in MS PIP is 100 × (1 − α) % For Any Future Look, RCI is computed at stopping look or last look. For Final Look, RCI is computed at final look as per design. RCI’s computed for all simulations are sorted on estimated value of Delta (or HR for survival end point) and are displayed on X -axis. On Y-axis estimated values of Delta are plotted. Current (Interim Monitoring) value of confidence interval is displayed by a black horizontal line in the PIP. Color coding is applied which helps in deciding the density of the observed estimated values of Delta (or HR). Read-offs on PIP is a simple matter of computation of counting the number of repeated confidence intervals which satisfy particular condition. 2408 L.4 Computing and Displaying PIP <<< Contents * Index >>> M Enrollment/Events Prediction - Theory The terms ’Enrollment’ and ’Accruals’ are used interchangeably in this Appendix chapter. The Predict module in East 6.4 simulates subject enrollment and events in a clinical trial. These simulations are also part of Enrollment/Events simulation at design stage (Chapter 66 and Enrollment/Events simulation at Interim Monitoring stage (Chapter 67). The underlying theory of generating accruals and/or events is same for both the situations. In this Appendix we present the theory and algorithms based on which the arrival times, time to event data (survival data) and drop out times are simulated. Generation of these quantities make the realizations of accruals, events and drop outs possible which are further used in deriving estimates of average accrual duration, average follow up time, average study duration etc which are of much use to the investigator. 13.1 Enrollments Generation In East 6.4, the subjects are enrolled assuming a Poisson process for the arrivals. In case the arrivals are across a number of sites, the option of Uniform arrivals is also provided. The arrivals are assumed to occur independently of each other. Exponential Distribution : In the Poisson process, the inter-arrival times follow an exponential distribution which has a density function as follows: f (x) = λe−λx , x ∈ [0, ∞) The ‘Poisson’ option in the Predict module of East generates subject enrollments by randomly sampling successive inter-arrival times from an exponential distribution with parameter λ. The inter-arrival times obtained describe the time difference (in terms of days, months or years depending on the chosen unit of analysis) between the arrivals of consecutive subjects. In East 6.4 the accruals are assumed to occur in a specified period with fixed accrual rate λ. Input The primary input for East Predict simulation is the enrollment plan. It specifies for a set of regions/sites the activation periods (the duration over which the site is to be initialized), the accrual rates per site and the maximum number of subjects that may be enrolled in that region/site. The tables below display examples of enrollment plans by region and by site. Region ID Number of Sites Region 1 Region 2 Region 3 5 5 10 13.1 Enrollments Generation Site Initiation Period Start End 0 0 0 2 2 5 Accrual Rate/Site Enrollment Cap 3 4 2 1000 1000 1000 2409 <<< Contents 13 * Index >>> Enrollment/Events Prediction - Theory Site ID Site 1 Site 2 Site 3 13.2 Enrollment Simulation Algorithm Site Initiation Period Start 0 1 0 1 1 2 Accrual Rate/Site End 5 5 8 Enrollment Cap 1200 1200 1200 Suppose the number of accruals to be simulated in every simulation run is N . Let g : # distinct regions in the study s : Total # Sites in the study si : # Sites in Region i, i = 1, 2, · · · , g. The algorithm will involve following steps: For every simulation, 13.2.1 Generation of Site Initiation Times For a multi-center trial, the arrivals could be from different sites which may be grouped into a number of regions. At the beginning of the trial, some of the sites may be unopened which would get opened later. In order to simulate this scenario, East provides the option of specifying an Enrollment Plan (Chapter 67, Enrollments/Events prediction at Design Stage) which stores the information about Site Start time and Site End time for every site. The input can be either region wise or site wise. A region is comprised of many sites. If the input is region wise, then the variables Site Initiation Period Start, Site Initiation Period End, Accrual Rate per Site and Enrollment Cap are applicable region wise. For all the sites belonging to a region, the same values of the above mentioned variables apply. The site initiation can be anytime between Site Initiation Period Start and Site Initiation Period End. For the unopened sites, the Site Initiation Times are generated as Uniform random numbers between ( Start Time, End Time) Generate a Site Initiation Time from Uniform (SIPStart, SIPEnd) as follows: Generate a random value from Uniform (0,1), say u - Then, X= SIPStart +( SIPEnd − SIPStart)∗ u X is the generated Site Initiation time random value from Uniform(SIPStart, SIPEnd) At the end of this step we will have the Site Initiation times (SI Times) for all the sites. 13.2.2 Generation of Enrollments Sort the Sites data in order of the Region IDs and then in order of their SI times. The 2410 13.2 Enrollment Simulation Algorithm – 13.2.2 Generation of Enrollments <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 enrolments will start at each site as per the individual site accrual rate. Suppose ‘a’ is the site accrual rate. (i) Poisson Process: Inter-arrival Times Exponential R : random number between (0,1) R = F (x) = Exp(−ax) x = −ln(R)/a cij = SIT ime + x cij = arrival time for the next subject at the j-th site in the i-th region. (ii) Uniform Process R : random number between (0,1) R = F (x) = (x − M in)/(M ax − M in) x = Minimum + (Max-Min)R Minimum = SITime Maximum = SITime+ a1 cij = SITime + x cij = arrival time for the next subject at the j-th site in the i-th region. 13.2.3 Generation of Time on Study There is no generation of response for normal and binomial end point studies in Predict module. Only for survival studies, the ’Time to event’ data are generated. The generation of Time on Study follows the procedure described below: Input: Hazard rates specified. Notation : cij : survival time to be generated for i-th subject (τi , τi+1 ] : i-th interval in which survival information is specified, k : number of hazard pieces τi : starting time of i-th hazard piece with τ0 = 0. λi : hazard rate in i-th hazard piece For the l-th subject, generate its survival time as follows. Compute the survival time for this subject using the formula given below. Sl = τi−1 − 1 λi−1 ln 1 − vl 1 − e−λi−1 (τi −τi−1 ) Where ui and vl are random numbers between (0,1) . 13.2.4 Generation of Dropout Times The drop out time generation is on similar lines as that of time on study. 13.2 Enrollment Simulation Algorithm 2411 <<< Contents * Index >>> N N.1 The 3 + 3 Design Dose Escalation - Theory The 3 + 3 design method for finding the Maximum Tolerated Dose (MTD) in Phase I clinical trials is described in detail in this section. The 3 + 3 is a rule based design method which starts by allocating the lowest dose level to the first cohort and adaptively escalates/de-escalates to the next dose level based on observed number of dose limiting toxicities (DLTs), until either the MTD is obtained or the trial is stopped for excessive toxicity. It requires no modeling of the dose-toxicity curve beyond the classical assumption for cytotoxic drugs that toxicity increases with dose. There are three different versions of the 3 + 3: 3 + 3L , 3 + 3L (modified), and 3 + 3H . The 3 + 3L algorithm proceeds as follows: 1. At each dose level, treat 3 patients beginning with dose level 1. Escalate to the next dose level or de-escalate to the previous dose according to the following rules: (a) If 0 of 3 patients have a dose limiting toxicity (DLT), increase dose to next level. (b) If 2 or more patients has a DLT, decrease dose to previous level1 (c) If 1 of 3 patients has a DLT, treat 3 more patients at current dose level. i. If 1 of 6 has DLT, increase to next dose level. ii. If 2 or more of 6 have DLT, decrease to previous level. (d) If a dose has de-escalated to previous level: i. If only 3 had been treated at the previous level, enroll 3 more patients. ii. If 6 have already been treated at the previous level, stop study and declare it the MTD. 2. The maximum tolerated dose (MTD) is defined as the largest dose for which 1 or fewer DLTs occurred. 3. Escalation never occurs to a dose at which 2 or more DLTs have already occurred. If we have observed 1 DLT out of 6 patients at the current dose: 3 + 3H and 3 + 3L will recommend escalation, 3 + 3L (modified) will declare the current dose as MTD. If we have observed 2 DLTs out of 6 patients at the current dose: 3 + 3H will declare the current dose as MTD, 3 + 3L and 3 + 3L (modified) will recommend de-escalation 1 if 2412 de-escalation occurs at the first dose level, then the study is discontinued N.1 The 3 + 3 Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 N.2 The Continual Reassessment Method N.2.1 Model N.2.2 Dose Escalation rules The Continual Reassessment Method(CRM) for finding the Maximum Tolerated Dose (MTD) in Phase I clinical trials is described in detail in this section. The CRM, introduced originally by O’Quigley et al. (1990), assumes a-priori a monotonically increasing single-parameter dose-toxicity curve (DTC) and a desired toxicity rate pT . The estimated DTC is updated after each patient’s toxicity outcome is observed, so that each patient’s dose level is based on information about how previous patients tolerated the treatment. N.2.1 Model Yj is the binary toxicity outcome observed in the jth patient recruited to the trial, with Yj = 1 denoting a DLT. d1 , . . . , dk are the doses p1 , . . . , pk are the true unknown probabilities of toxicities for dose levels d1 , . . . , dk θ is the unknown parameter specifying the DTC ψ(di , θ) is the functional form of the DTC, with P rob(Yi = 1) = ψ(di , θ). There are three different forms of the DTC considered in East: 1. The Power Model ψ(di , θ) = dθi , for θ > 0 2. The Hyperbolic Tangent Model θ tanh di + 1 ψ(di , θ) = , for θ > 0, and 2 3. The single-parameter Logistic Model ψ(di , θ) = ec+θdi for θ > 0, and c fixed 1 + ec+θdi A Bayesian approach is implemented by placing a prior distribution, π(θ) on the model parameter. The adaptive nature of the CRM arises from choosing the dose for the next patient based on the posterior distribution from the currently recruited patients which is π(θ|y1 , . . . , yn ) ∝ L(θ; y1 , . . . , yr )π(θ), Qr where L(θ; y1 , . . . , yr ) = j=1 ψ(di , θ)yj (1 − ψ(di , θ))1−yj and r is the number of subjects for which responses are observed. Prior distributions N.2 The Continual Reassessment Method – N.2.1 Model 2413 <<< Contents * Index >>> N Dose Escalation - Theory The choice of a prior distribution for the parameter θ depends on the choice of a DTC. In particular Power and Hyperbolic Tangent Models: θ is a-priori distributed as a Gamma random variable, π(θ) = θα−1 exp(−θβ)β α , for θ > 0, α, β > 0 Γ(α) Single-parameter Logistic Model: θ is a-priori distributed as a log-normal random variable, θ−µ)2 exp − (ln 2σ 2 √ π(θ) = , for θ > 0, µ ∈ R, σ > 0 θσ 2π N.2.2 Dose Escalation rules The dose to be assigned to the next patient, or cohort of patients is the one that has posterior probability of being closest to the target toxicity probability pT and simultaneously below an upper limit of the toxicity probability denoted by pU L . In particular, the next cohort of patients is assigned to dose di = argmini (p̂ir − pT ) where p̂ir is the posterior probability of toxicity after r subject responses. By default East uses in its dose escalation rules, the modification in the original CRM proposed by Goodman et al. based on which any given dose escalation cannot increase by more than one level, although dose de-escalations can be large. In addition a dose escalation is not allowed if the previous subject experienced a DLT. Both restrictions can be lifted by selecting the corresponding “Dose Skipping Options” in the “Design Parameters” tab. N.3 The Modified Toxicity Probability Interval Design N.3.1 Dosing Intervals N.3.2 Dose Escalation Rules N.3.3 Computation of the MTD 2414 This section describes the modified Toxicity Probability Intervals (mTPI) proposed by Yuan Ji et al.(2010). The mTPI is a model-based design and it consists of 3 components: 1. Three dosing intervals, 2. a beta/binomial Bayesian model, and 3. a dose-assignment rule based on Unit Probability Mass (UPM). Following the notation of Section N.2, we let p1 , . . . , pk denote the toxicity probabilities for doses d1 , . . . , dk where k is the total number of candidate doses in the trial. The observed data include ni , the number of patients treated at dose i, and xi , the N.3 The Modified Toxicity Probability Interval Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 number of patients experiencing a toxicity. The likelihood function for data {(xi , ni ), i = 1, . . . , k} is a product of binomial densities. The estimates of these toxicity probabilities pi are sequentially updated and are used to decide if some of the doses studied would be close to the true MTD. This is achieved through Bayes’ Theorem. Each pi is a-priori distributed as a Beta-random variable Beta(α, β) and a-posteriori is Beta(α + xi , β + ni − xi ). N.3.1 Dosing Intervals The mTPI design employs a simple beta-binomial hierarchic model. Decision rules are based on calculating the unit probability mass (UPM) of three intervals corresponding to underdosing, proper dosing, and overdosing in terms of toxicity. More specifically, the underdosing interval is defined as (0, pT − 1 ), the overdosing interval as (pT + 2 , 1) and the proper dosing interval as (pT − 1 , pT + 2 ) where i are small fractions, such as 0.05, to account for the uncertainty around the true target toxicity. These three dosing intervals are associated with three different dose-escalation decisions. The underdosing interval corresponds to a dose escalation (E), overdosing corresponds to a dose de-escalation (D), and proper dosing corresponds to staying at the current dose (S). N.3.2 Dose Escalation Rules Given an interval and a probability distribution, the UPM of that interval is defined as the probability of the interval divided by the length of the interval. The mTPI design calculates the UPMs for the three dosing intervals, and the one with the largest UPM implies the corresponding dose-finding decision. That decision provides the dose level to be used for future patients. In particular, the algorithm proceeds as follows: 1. Compute the posterior probability of excessive toxicity at the current tried dose, i.e., P rob(pi > pT |xi ) which is a function of the cumulative Beta distribution Beta(α + xi , β + ni − xi ). Using a threshold for early stopping for safety, ξ, the current and all higher doses are excluded from the trial due to excessive toxicity if P rob(pi > pT |xi ) > ξ 2. If P rob(pi > pT |xi ) < ξ we compute the UPM for each of the three toxicity probability intervals described in section N.3.1 as follows: (a) U P M (D)di = P rob(pi > (pT + 2 )|xi ) 1 − (pT + 2 ) (b) U P M (S)di = P rob((pT − 1 ) ≤ pi ≤ (pT + 2 )|xi ) 2 − 1 N.3 The Modified Toxicity Probability Interval Design – N.3.2 Dose Escalation Rules2415 <<< Contents * Index >>> N Dose Escalation - Theory (c) U P M (E)di = P rob(pi < (pT − 1 )|xi ) p T − 1 3. Select one of the following actions: E, S or D corresponding to the highest UPM of each toxicity interval provided that the resulting dose level was not excluded in Step 1. 4. If the selected action is ’E’ and the current tried dose is the highest dose, then stop the trial. Similarly, 5. if the selected action is ’D’ and the current tried dose is the lowest dose, then stop the trial. N.3.3 Computation of the MTD Once all the N toxicity responses are observed, we compute the MTD by using all the observed data. To compute the MTD, follow the steps as given below: 1. Using the accumulated information about xi and ni for i = 1, . . . , k compute the posterior mean and variance for all the dose levels. 2. Compute isotonic regression estimates of the posterior mean by using the PAVA method with the inverse of the posterior variances of pi as the weights to obtain isotonically transformed posterior means denoted by say, p∗i . 3. Among all the tried doses i for which P rob(pi > pT |xi ) < ξ, select the estimated MTD as the dose with the smallest difference pT − p∗i . 4. In case of a tie (i.e. two or more doses have the smallest difference), (a) If all the tied doses have the probability of toxicity above the target, select the lower dose as the MTD. (b) Else select the higher dose level as MTD. N.4 The Bayesian Logistic Regression Model N.4.1 Prior distribution specification N.4.2 Dosing Intervals and Selection N.4.3 Posterior Calculations This section describes the Bayesian Logistic Regression model as proposed by Neuenschwander et. al. (2009). We follow the notation of Section N.2 and consider a bivariate DTC of the form ∗ ψ(di , α, β) = (N.1) where dR is a reference dose, determined in a way so that ln α becomes the log-odds of toxicity when di = dR . N.4.1 2416 eln α+βdi ∗ ∗ for α, β > 0, and di = ln(di /dR ), 1 + eln α+βdi Prior distribution specification N.4 The Bayesian Logistic Regression Model – N.4.1 Prior distribution specification <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The vector θ = (ln α, ln β)0 follows a-priori a bivariate normal distribution with mean vector µθ and variance-covariance matrix Σ. Determining the prior distribution parameters can be done in two ways; directly or indirectly. Direct Prior Elicitation Using the direct prior elicitation approach involves incorporating information about α and β directly. The parametrization of Equation (N.1) allows for the interpretation of the parameters as follows: 1. ln α is the log-odds of a toxicity when di = dR . As such, the normal distribution of ln α would represent prior information for this dose. As stated in Neuenschwander et al (2008), if one sets the reference dose dR to the a-priori anticipated MTD, the mean of ln α would follow from the target probability pT and an additional quantile would be needed to obtain the prior standard deviation. 2. For two doses di and dj , the parameter β is the log-odds ratio of a DTL, i.e., β= logit(ψ(dj )) − logit(ψ(di )) . ln(dj /di ) As an example, the parameters of the normal distribution of ln β can be obtained by specifying two quantiles for the change in the odds of a DLT if the dose is doubled. Indirect Prior Elicitation The indirect prior elicitation approach results in an uninformative prior specification for θ = (ln α, ln β)0 . The following steps are used for this prior distribution specification: 1. Using preclinical data, one can calculate the starting dose and predicted MTD for the study. Median DLT rates are assigned for this two doses, e.g., 0.05 and 0.33 respectively. 2. The remaining doses are assumed to be linear in log-odds in one the ln(d/dR ) scale and lead to estimated median DLT rates for doses of interest. 3. At each dose level, a minimally informative Beta prior for the probability of a DLT is set and the 2.5% and 97.5% quantiles for each distribution are calculated. 4. The parameters of the bivariate normal distribution of θ are tuned so that the difference between the 2.5% and 97.5% quantiles of this distribution and the targeted values from the Beta distributions is minimized. N.4 The Bayesian Logistic Regression Model – N.4.2 Dosing Intervals and Selection2417 <<< Contents * Index >>> N Dose Escalation - Theory N.4.2 Dosing Intervals and Selection The probability of a DLT is classified into four categories: underdosing (c0 = 0, c1 ], targeted toxicity, (c1 , c2 ], excessive toxicity (c2 , c3 ] and unacceptable toxicity (c3 , c4 = 1]. Dose selection proceeds with one of the two following methods: Bayes Risk Minimization A formal loss function is introduced, quantifying the penalty of ending up in each of the four aforementioned intervals: l1 if P rob ((α, β|data, d∗ ) ∈ (c0 , c1 ]) l2 if P rob ((α, β|data, d∗ ) ∈ (c1 , c2 ]) L((α, β), d∗ ) = l if P rob ((α, β|data, d∗ ) ∈ (c2 , c3 ]) 3 l4 if P rob ((α, β|data, d∗ ) ∈ (c3 , c4 ]) P4 leading to a estimated Bayes risk of i=1 li {P rob ((α, β|data, d∗ ) ∈ (ci−1 , ci ])}. The dose minimizing the Bayes risk is selected as the next dose. Escalation With Overdose Control (EWOC) Babb et al.(1998) proposed to select the dose for each patient as the one that maximizes the probability of targeted toxicity, i.e., P rob ((α, β|data, d∗ ) ∈ (c1 , c2 ]) subject to the constraint that the probability of overdosing (i.e., excessive and unacceptable toxicity) does not exceed a predefined threshold αT , say 0.25, called “the feasibility bound”. That is, choose the dose level subject to the constraint P rob ((α, β|data, d∗ ) ∈ (c2 , c4 ]) ≤ αT . N.4.3 Posterior Calculations The dose selection process described in Section N.4.2 depends in the calculation of the posterior probability P rob ((α, β|data, d∗ ) ∈ (ci−1 , ci ]) , (N.2) for i = 1, 2, 3, 4 which is calculated with respect to π(θ|y, d∗i ) e ∝ Qr ∗ j=1 (ln α+βdi )yj Pr j=1 ∗ 1 + e(ln α+βdi ) × π(θ) As this bivariate posterior distribution is not a standard known distribution we calculate (N.2) by employing two different sampling-based methods. 2418 N.4 The Bayesian Logistic Regression Model – N.4.3 Posterior Calculations (N.3) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Metropolis-Hastings The Metropolis-Hastings algorithm for obtaining samples from (N.3) proceeds as follows: 1. Given a starting value of θ = θ (0) , generate a candidate value θ ∗ = θ (0) + σ, where ∼ N2 (0, I2 ). 2. Calculate π(θ ∗ |y, d∗i ) ρ = min ,1 π(θ (0) |y, d∗i ) 3. Draw randomly v ∼ U nif (0, 1) 4. If v ≤ ρ then set θ (1) = θ ∗ , otherwise retain θ (1) = θ (0) 5. Repeat steps 1-4 until convergence Direct sampling The second sampling method from the posterior distribution in (N.3) is a block sampling method. In involves discretizing ln α and ln β values along with their probability of occurrence. The likelihood of each support point (ln α, ln β) is computed in this discrete prior. A block of values for (ln α, ln β) is sampled by first sampling a value ln α from its discrete marginal distribution and then a value of ln β from the discrete conditional distribution of ln β| ln α using the inverse cumulative distribution method. N.5 Bayesian Logistic Regression Model for Combination of Two Agents This section describes the BLRM design for a combination of two active agents. Prior Distribution log α of model parameters for each active agent apriori follows a log β bivariate normal distribution as follows: 2 σα σαβ log α µα θ= ∼ BV N , σαβ σβ2 log β µβ The vector θ = σαβ = ρσα σβ Where µα refers to prior mean of log α, µβ refers to prior mean of log β, σα refers to prior SD for log α, σβ refers to prior SD for β and ρ refers to correlation between log α and log β. The interaction parameter (η) apriori follows a normal distribution as follows: η ∼ N (µη , ση2 ) N.5 Bayesian Logistic Regression Model for Combination of Two Agents 2419 <<< Contents * Index >>> N Dose Escalation - Theory where µη denotes the prior mean of η and ση denotes the prior SD of η. Model Definition The proposed model has the following properties: (a) It has three components which stands for - Single-agent 1 toxicity, represented by parameters α1 , β1 - Single-agent 2 toxicity, represented by parameters α2 , β2 - Interaction, represented by parameter η. (b) If one of the doses is 0, d2 = 0, say, the model should simplify to the single-agent model with parameters α1 , β1 . Single-agent probabilities of DLT: Probability of DLT, given Agent1: πd1 Probability of DLT, given Agent2: πd2 . πd1 and πd2 are vectors of Probability of DLT at each dose of Agent 1 and Agent 2 respectively. In the special case of no interaction the single-agent parameters fully determine the risk of a DLT. For dose combination (d1 , d2 ) a patient’s probability to have no DLT is (1 − πd1 )(1 − πd2 ). Hence, Probability of DLT under no interaction is πd01 ,d2 = 1 − (1 − πd1 )(1 − πd2 ) = πd1 + πd2 − πd1 πd2 On the odds scale this is equivalent to odds0d1 ,d2 = oddsd1 + oddsd2 + oddsd1 oddsd2 Interaction parameter (η) has the interpretation of an odds-multiplier, as follows: oddsd1 ,d2 = odds0d1 ,d2 × g(η, d1 , d2 ) The odds-multiplier g should fulfill the constraints g(η, 0, d2 ) = g(η, d1 , 0) = 1, since if one of the doses is 0, it should result in the single-agent odds. Hence, g(η, d1 , d2 ) is defined as g(η, d1 , d2 ) = exp(η, d1 , d2 ). We will use same interaction for all dose combination and hence we can simply use exp(η). Typically η > 0, but not necessarily η = 0: No interaction, the drug combination produces a toxic effect whose magnitude is equal to that obtained if the drugs act independently in the body. 2420 N.5 Bayesian Logistic Regression Model for Combination of Two Agents <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 η < 0: Protective, the drug combination produces a toxic effect whose magnitude is less than that obtained if the drugs act independently in the body. η > 0: Synergistic, the drug combination produces a toxic effect whose magnitude is greater than that obtained if the drugs act independently in the body. Likelihood L(θ1 , θ2 , η|d1 , d2 , y) = S h Y πd1p ,d2p y p × 1 − πd1p ,d2p 1−yp i p=1 S = Observed sample size d1p = Dose of Agent1 assigned to patient p d2p = Dose of Agent2 assigned to patient p πd1p ,d2p = Probability of DLT with interaction (η) for patient p yp = Binary response (0 or 1) of patient p. Prior Distribution for Two Parameter Logistic Model Prior distributionπ(θi ) ∝ Z= σαi σβi 1 p −zi 2 e 2(1−ρi ) 2 1 − ρi (log αi − µαi )2 (log βi − µβi )2 2ρi (log αi − µαi )(log βi − µβi ) + − 2 2 σαi σβi σαi σβi i = 1 for Agent 1 and i = 2 for Agent 2. Prior distribution for interaction parameter η. Prior distributionπ(η) ∝ (η − µη )2 1 −Z e where Z = . ση 2ση2 Posterior Distribution π(θ1 , θ2 , η|y) ∝ L(θ1 , θ2 , η|d1 , d2 , y) × π(θ1 ) × π(θ2 ) × π(η). N.5 Bayesian Logistic Regression Model for Combination of Two Agents 2421 <<< Contents * Index >>> N Dose Escalation - Theory Posterior Sampling Method : Metropolis Hastings Step 1 : Initialize θ1 = (log α10 , log β10 ), θ2 = (log α20 , log η20 ), η = η 0 and Sim = 1. Step 2 : Generate a new candidate for Agent1, θ1∗ = θ1 + RW σ1 ∗ 1 where 1 ∼ BV N (0, 1). Step 3 : Calculate ratio R1 = min π(θ1∗ ,θ2 ,η|y) π(θ1 ,θ2 ,η|y) , 1 . Step 4 : Draw a random number v1 ∼ U (0, 1) and if v1 < R1 then accept the new candidate θ1∗ and set θ1 = θ1∗ . Step 5 : Generate a new candidate for Agent2, θ2∗ = θ2 + RW σ2 ∗ 2 where 2 ∼ BV N (0, 1). Step 6 : Calculate ratio R2 = min π(θ1 ,θ2∗ ,η|y) π(θ1 ,θ2 ,η|y) , 1 . Step 7 : Draw a random number v2 =∼ U (0, 1) and if v2 < R2 then accept the new candidate θ2∗ and set θ2 = θ2∗ . Step 8 : Generate a new candidate for interaction, η ∗ = η + RW ση ∗ 3 where 3 ← BV N (0, 1). Step 9 : Calculate ratio R3 = min π(θ1 ,θ2 ,η ∗ |y) π(θ1 ,θ2 ,η|y) , 1 . Step 10 : Draw a random number v3 ← U (0, 1) and if v3 < R3 then accept the new candidate η ∗ and set η = η ∗ . Step 11 : Store the value in parameter θ1 , θ2 and η for simulation Sim. Step 12 : Go to next simulation, Sim = Sim + 1. If Sim > SimM H + BurninM H then Stop else Go to Step 2. Dose Finding Method 2422 N.5 Bayesian Logistic Regression Model for Combination of Two Agents <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 1. Compute posterior samples using the Metropolis Hastings method. 2. Compute posterior probability of DLT for every dose pair using steady state samples and Model definition for Combination of Two Agents. 3. Compute probability of being in each toxicity interval for every dose pair as follows, (a) Count number of steady state simulations for which posterior probability of DLT lies within each interval (b) Divide the count for each interval by number of steady state simulations. 4. Exclude the dose pairs which doesn’t satisfy the following EWOC principle, Probability of being in overdosing interval < EW OC threshold. 5. If all dose pairs are excluded then stop the trial due to overdosing else go to next step 6. If user has selected any stopping rule(s) to determine MTD early in the trial then check the rules as follows, (a) Consider only the dose pairs which are not excluded due to overdosing (b) Select the dose pair which has maximum probability in the target interval and minimum probability in the overdosing interval. In case the ties still exist, then select the largest of dose pairs based on the dose indices. (See the Note below for this change) (c) Min SS Rule: Check whether the total number of subjects observed in the trial is >= user specified threshold (d) Allocation Rule: Check whether the total number of subjects observed on the selected dose pair is >= user specified threshold (e) Target Rule: Check whether the probability of being in targeted toxicity interval for the selected dose pair is >= user specified threshold. 7. Stop the trial if MTD is determined in the Step6 else go to next step 8. Compute the next dose pair to be assigned to the next group of subjects as follows, (a) Consider only the dose pairs which are not excluded due to overdosing (b) Filter the dose pairs which satisfies the selected Dose Skipping Option and the requirement of whether to increase dose of both agents at the same time (c) Select the highest dose pair which has maximum probability of being in targeted toxicity interval as the next dose. 9. Compute MTD for the final analysis as follows, (a) Consider only the tried dose pairs which are not excluded due to overdosing N.5 Bayesian Logistic Regression Model for Combination of Two Agents 2423 <<< Contents * Index >>> N Dose Escalation - Theory (b) Select the highest dose pair which has maximum probability of being in targeted toxicity interval as MTD. N.6 The Product of Independent Beta Probabilities Escalation Design The Product of Independent Beta Probabilities Escalation design is a Bayesian dose finding method for a combination therapy with two active agents. This method allows for the specification of prior risk of toxicity for all dose combinations and uses posterior probabilities from all proposed dose combinations for dose escalation. The aim is to design a dual agent dose escalation trial targeting a MTD contour such that the risk of toxicity for all dose combinations on this contour is the pre-specified target toxicity level pT . Prior and Posterior Distributions Let diA denote the i-th dose level of drug A and djB denote the j-th dose level of drug B where doses increase with i and j and i = 1, · · · , I and j = 1, · · · , J. We assume that the probabilities of toxicity at every dose combination follow an independent Beta distribution i.e. πij |aij , bij → Beta(aij , bij ) ∀ i, j. Prior distribution can be specified in two formats: 1. Prior median of P(DLT) πij and prior sample size SSij for each dose combination dij . 2. Prior parameters aij and bij of the Beta distribution for each dose combination dij . If the prior is specified in format (a), it is internally converted into the format (b) by the (m) (m) software. Suppose Y (m) = {rij , nij , i = 1, · · · , i, j = 1, · · · J}: Data up to the (m) (m) end of mth cohort. Such that we have observed rij DLTs from nij patients for the dose combination dij . Then because of conjugacy and prior independence of the πij , the posterior distribution of πij is also a Beta distribution given by (m) (m) (m) (πij |Y (m) aij , bij ) ← Beta(aij + rij , bij + nij − rij ) ∀ i, j. We assume that the toxicity risk increases with increasing dose, i.e. πij < π(i+1) , I = 1, · · · , I − 1, ∀ j and πij < πi(j+1) , j = 1, · · · , J − 1, ∀ i, j = 1, · · · , J − 1. Maximum Tolerated Contour 2424 N.6 The Product of Independent Beta Probabilities Escalation Design <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The Maximum Tolerated Contour is formed by the dose combinations that have a posterior mean of DLT rate equal to the targeted toxicity risk. The PIPE design method targets the MTC corresponding to the pre-specified target probability of toxicity (pT ) to recommend the dose level for the next cohorts. Let us denote this MTC as MTCθ . It is estimated by the line partitioning the dose combination space into toxicity risks above θ or less than θ. MTCθ must be such that it does not contradict the assumption of monotonicity. Dose Escalation Rules For dose escalation, the PIPE method begins by identifying the set of admissible dose combination based on one of the following three criteria: adjacent to MTCθ , closest to MTCθ , and those lying in the interval of fixed pre-specified length around the targeted toxicity probability pT . “Adjacent” doses are the dose levels that lie adjacent to the current estimated MTCθ . The “closest” doses are defined as those adjacent doses below/above the contour that cannot move up (for below) or down (for above) by one dose level without crossing the contour. Hence “closest” doses must be a subset of adjacent doses. The “Interval” criteria picks up the dose levels having probability of DLT within the interval (pT − , pT + ) where is the pre-specified margin. Dose Skipping Rules Dose skipping during escalation can be achieved by using one of the following criteria: Neighborhood constraint, non- neighborhood constraint. Under the neighborhood constraint, the admissible doses for the next cohort further reduces to a set of doses that are a maximum of one dose level higher or lower than the current experimented dose, both for agents A and B. Hence any dose combination can be chosen up to one dose level above or below current drug A and drug B levels including the current dose combination. Under the non-neighborhood constraint, all the previous doses administered are considered, and to allow dose skipping, the constraint allows any dose that is a single-dose level higher in both agents A and B than any previously administered dose combination. The option related to diagonal dose escalation allows escalating levels of both agents at the same time. Dose Selection The dose combination for the next cohort is selected from the admissible dose set. This can be done in two possible ways. One is to select the next dose to be the admissible dose with the smallest current sample size. Here sample size is defined as the sum of the prior sample size and the sample size observed in the trial. N.6 The Product of Independent Beta Probabilities Escalation Design 2425 <<< Contents * Index >>> N Dose Escalation - Theory (m) That is, we select a dose dij where (i, j) = arg min Sξ ξ∈Ω(m) (m) where (m) Sij = nij + aij + bij . The other possible dose selection method is based on a weighted randomization, where the selection of the admissible doses is weighted by the inverse of their sample size. −1(m) P(cohort m is allocated dij |(i, j) ∈ Ω(m) = Sij P and the dose combination with the ξ∈Ω(m) highest probability is chosen. At the end of the trial, the MTD is selected as the dose closest to the estimated MTCθ from below. 2426 N.6 The Product of Independent Beta Probabilities Escalation Design <<< Contents * Index >>> O O.1 Introduction For East6, in simulation module we will provide the user the opportunity to perform various tasks using R. In this chapter, we list all tasks for which R functions can be used. We will provide syntax and suggested format for various functions. We have divided functions in various categories. 1. 2. 3. 4. 5. 6. O.2 Initialization Function R Functions Function for initialization Functions for data generation Functions for test statistic and perform test computations Functions for performing basic simulations Functions for re-estimating sample size in adaptive simulations Function for selecting treatment in multi-arm combining p-values design This function will be optional. If provided, this function will be executed before executing any of the other user defined functions. User can use this function for various reasons. Below we list some of these. 1. Setting seed for R environment 2. Setting working directory for R 3. Initializing global variables. For more details of uses of this function please see section O.12. The following table provides details about Initialization function. O.2 Initialization Function 2427 <<< Contents * Index >>> O R Functions Table O.1: Initialization Function Suggested Name of the function Description Syntax Arguments Init() Performs Initialization for all simulations Init(Seed) Argument Seed Return Value Type Suggested format 2428 Description Seed to be set at the beginning of all simulations Integer (Optional). This function may return Error Code (optional) Init ← function(Seed) { Error = 0 set.seed(Seed) # User may use other options in set.seed like setting the random # number generator. User may also initialize global variables or # set up the working directory etc. # Do the error handling. Modify Error appropriately return (as.integer(Error)) } O.2 Initialization Function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.3 Data Generation Functions O.3.1 Generating Arrival Times O.3.2 Generating Censor indicator O.3.3 Generating Dropout Times O.3.4 Randomizing Subjects to Treatments O.3.5 Randomizing Subjects to Groups O.3.6 Randomizing Subjects to Populations Following points are applicable to all functions used for Data Generation described in this document. 1. This document provides suggested name for each function. 2. Argument names and Argument Type for each function are compulsory but the order of the arguments is not. Input argument names are case sensitive. 3. User can have additional input arguments in the function but he must make sure that appropriate values will be available for those additional arguments during function call. For details please see section O.13. 4. Function will return a list. The Identifier Names (Case Insensitive) and Type (we strongly advice that user should type cast the output elements) mentioned for outputs in a list for a particular function are compulsory while their order in the list is not. User can have additional outputs in the list. If user wants to print the arrays (Same size as number of subjects) in the Simulation CSV file then he has to provide identifier for those arrays. These identifiers will be the columns names in output. Any repeated identifiers (column names) will be ignored. 5. We suggest that the return List contain an identifier ”ErrorCode”. If specified, it has to be of Type Integer. Its values are classified as follows. 0: No Error Positive integer: Non Fatal Error - Particular Simulation will be aborted but Next Simulation will be performed. Negative Integer: Fatal Error - No further simulation will be attempted. We suggest that user should classify error in these categories depending on the context. O.3 Data Generation Functions 2429 <<< Contents * Index >>> O R Functions O.3.1 Generating Arrival Times Table O.2: Function for Generating Arrival Times Suggested Name of the function Description Syntax Arguments GenArrTimes() Generates arrival times for a specified number of subjects. Start time and accrual rate (one per period) for each period is provided. GenArrTimes(NumSub, NumPrd, PrdStart, AccrRate) Compulsory Argument NumSub NumPrd PrdStart AccrRate Return Value Type Suggested format 2430 Description Number of Subjects Number of Accrual Periods Array of start times of specified periods Array of accrual rates (one rate per period) R List The must identifiers in this list are Identifier Description ArrivalTime An array of generated arrival times Type Double. GenArrTimes ← function(NumSub, NumPrd, PrdStart, AccrRate) { Error = 0 # Write the actual code here. # Store the generated accrual times in an array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(ArrivalTime = as.double(retval), ErrorCode = as.integer(Error))) } O.3 Data Generation Functions – O.3.1 Generating Arrival Times <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.3.2 Generating Censor indicator Table O.3: Generating Censor Indicator (Normal and Binary) Suggested Name of the function Description Syntax Arguments GenCensorInd() Generates Censor Indicator (Subject has dropped out (0) or not (1)) for a specified number of subjects. GenCensorInd (NumSub, ProbDrop) Compulsory Argument NumSub ProbDrop Return Value Type Suggested format Description Number of Subjects Probability of Drop out R List The must identifiers in this list are Identifier Description CensorInd An array of censor indicator values 0 (Drop out) and 1(No Drop out) GenCensorInd ← function(function(NumSub, ProbDrop) { Error = 0 # Write the actual code here. # Store the generated censor indicator values in an # array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(CensorInd = as.integer(retval), ErrorCode as.integer(Error))) } Type Integer = O.3 Data Generation Functions – O.3.2 Generating Censor indicator 2431 <<< Contents * Index >>> O R Functions O.3.3 Generating Dropout Times Table O.4: Generating Dropout Times (Survival) Suggested Name of the function Description Syntax for one arm test Syntax for more than one arm test Arguments GenDropTimes() Generates dropout times for a specified number of subjects for survival end point. GenDropTimes (NumSub, DropMethod, NumPrd, PrdTime, DropParam) GenDropTimes (NumSub, NumArm, TreatmentID, DropMethod, NumPrd, PrdTime, DropParam) Argument NumSub NumArm TreatmentID DropMethod NumPrd 2432 Description Number of Subjects Number of Arms in the trial (including placebo/control) Array specifying indexes of arms to which subjects are allocated (one arm index per subject) Index for placebo / control is 0. For other arms, indexes are consecutive positive numbers starting with 1. Thus if the trial has 4 arms (1 placebo + 3 treatment arms), arm indexes will be 0, 1, 2 and 3. Input method for specifying dropout parameters. 1 - Hazard rates 2 - Probability of Dropouts Number of dropout periods O.3 Data Generation Functions – O.3.3 Generating Dropout Times <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.5: Generating Dropout Times (Survival) (Continued) Argument PrdTime DropParam Arguments Return Value Type Description Array of times used to specify dropout parameters. If DropMethod is 1, then this array specifies the starting times of dropout periods. If DropMethod is 2, then this array specifies the times at which the probabilities of dropout are specified. 2-D array of parameters uses to generate dropout times. Number of rows = Number of Dropout Periods Number of Columns = Number of Arms including Control/Placebo If DropMethod is 1, the DropParam array specifies arm by arm hazard rates (one rate per arm per piece). Thus DropParam [i, j] specifies hazard rate in ith piece for jth arm. If DropMethod is 2, the DropParams array specifies arm by arm probabilities of dropout (one value of probability of dropout per arm per piece). Thus DropParams [i, j] specifies probability of dropout in ith piece for jth arm. R List The must identifiers in this list are Identifier Description DropOutTime An array of generated drop out times Type Double O.3 Data Generation Functions – O.3.3 Generating Dropout Times 2433 <<< Contents * Index >>> O R Functions Table O.6: Generating Dropout Times (Survival) (Continued) Suggested format 2434 GenDropTimes ← function(NumSub, NumArm, TreatmentID, DropMethod, NumPrd, PrdTime, DropParam) { Error = 0 If(DropMethod == 1) { # Write the actual code for method 1 here. # Store the generated dropouts times in an array called retval. } If(DropMethod == 2) { # Write the actual code for method 2 here. # Store the generated dropout times in an array called retval. } # Use appropriate error handling and modify the # Error in each of the methods appropriately. return(list(DropOutTime = as.double(retval), ErrorCode = as.integer(Error)) } Please note that ErrorCode is optional for this function. O.3 Data Generation Functions – O.3.3 Generating Dropout Times <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.3.4 Randomizing Subjects to Treatments Table O.7: Treatment Randomization Suggested Name of the function Description Syntax Arguments GenTreatID() Randomizes subjects to specified arms. The function should produce 0-based indexes of arms to which the subjects are allocated. The treatment arms have consecutive positive arm indices starting with 1. randomize(NumSub, NumArm, AllocRatio)) Argument NumSub NumArm AllocRatio Return Value Type Suggested format Description Number of Subjects to randomize Total Number of arms in the trial Array of size (NumArm-1) specifying expected allocation ratios for the treatment arms (Allocation ratios are relative to placebo.) R List The must identifiers in this list are Identifier Description TreatmentID An array of generated allocation indices for all subjects. Placebo = 0 Type Integer GenTreatID ← function(NumSub, NumArm, AllocRatio) { Error = 0 # Write the actual code here. Store the generated treatment indices # in an array called retval. Use error handling and modify the error appropriately. return(list(TreatmentID = as.integer(retval), ErrorCode = as.integer(Error))) } O.3 Data Generation Functions – O.3.4 Randomizing Subjects to Treatments 2435 <<< Contents * Index >>> O R Functions O.3.5 Randomizing Subjects to Groups Table O.8: Group Randomization Suggested Name of the function Description Syntax Arguments GenGroupID() Randomizes subjects to specified groups. The function should produce 0-based indexes of groups to which the subjects are allocated. The groups have consecutive positive group indices starting with 1. The first group will have index 0. GenGroupID (NumSub, NumGrp, AllocRatio) Argument NumSub NumGrp AllocRatio Return Value Type Suggested format 2436 Description Number of Subjects to randomize Number of Groups in the trial. Array of size (NumGrp-1) specifying expected allocation ratios for the Groups (Allocation ratios are relative to first Group.) R List The must identifiers in this list are Identifier Description GroupID An array of generated allocation indices for all subjects. Type Integer GenGroupID ← function(NumSub, NumGrp, AllocRatio) { Error = 0 # Write the actual code here. Store the generated group indices # in an array called retval. Use appropriate error handling # and modify the Error appropriately. return(list(GroupID = as.integer(retval), ErrorCode = as.integer(Error))) } O.3 Data Generation Functions – O.3.5 Randomizing Subjects to Groups <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.3.6 Randomizing Subjects to Populations Table O.9: Population Randomization Suggested Name of the function Description Syntax Arguments GenPopulationID() Randomizes subjects to specified populations. Used only for Trend in R ordered proportions test. The function should produce 0-based indices of populations to which the subjects are allocated. The populations have consecutive positive population indices starting with 1. The first population will have index 0. GenPopulationID (NumSub, NumPop, AllocFrac) Argument NumSub NumPop AllocFrac Return Value Type Suggested format Description Number of Subjects to randomize Number of populations in the trial. Array of size (NumPop) specifying expected allocation fractions for the populations. R List The must identifiers in this list are Identifier Description PopulationID An array of generated populations indices for all subjects. Type Integer GenPopulationID ← function(NumSub, NumPop, AllocFrac) { Error = 0 # Write the actual code here. Store the generated population # indices in an array called retval. Use appropriate error handling # and modify the Error appropriately. return(list(PopulationID = as.integer(retval), ErrorCode = as.integer(Error))) } O.3 Data Generation Functions – O.3.6 Randomizing Subjects to Populations 2437 <<< Contents * Index >>> O R Functions O.4 Generating Continuous Response O.4.1 Response for Single Mean Test O.4.2 Response for Mean of Paired Differences Test O.4.3 Response for Difference of Means Test O.4.4 Response for Mean of Paired Ratio Test O.4.5 Generating Response for Ratio of Means Test O.4.6 Generating Binary Response Values O.4.7 Generating Categorical Response Values O.4.8 Generating Survival Times 2438 In this section we describe various functions for generating continuous response for various tests in East as well as SiZ. O.4.1 Response for Single Mean Test O.4 Generating Continuous Response – O.4.1 Response for Single Mean Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.10: Generating response for Single Mean Test Suggested Name of the function Description for a specified number of subjects. Syntax Arguments GenRespSingleMean() Generates response values for Single Mean Test GenRespSingleMean (NumSub, Mean, StdDev) Argument NumSub Mean StdDev Return Value Type Suggested format Description Number of Subjects Array (Size 1) specifying mean response value. Array (Size 1) specifying standard deviation. R List The must identifiers in this list are Identifier Description Response An array of generated response for all subjects Type Double GenRespSingleMean ← function(NumSub, Mean, StdDev) { Error = 0 # Write the actual code here. # Store the generated response values in an # array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(Response = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. O.4.2 Response for Mean of Paired Differences Test O.4 Generating Continuous Response – O.4.2 Response for Mean of Paired Differences Test2439 <<< Contents * Index >>> O R Functions Table O.11: Generating response for Mean of Paired Differences test Suggested Name of the function Description Syntax Arguments GenRespPairedDiff() Generates response values for Mean of Paired Differences Test for a specified number of subjects. GenRespPairedDiff (NumSub, Mean, SigmaD) Argument NumSub Mean SigmaD Return Value Type Description Number of Subjects Array (Size 2) specifying mean response value for Control (First element) and Treatment (second element) Arm. Array (Size 1) specifying Standard Deviation of Paired Difference. R List The must identifiers in this list are Identifier Description DiffResp An array of Difference of generated response values on Treatment and Control arm. OR RespC An array of generated Control response values for all subjects RespT An array of generated Treatment response values for all subjects Note - If ”DiffResp” is found in output list then ”RespC” and ”RespT” will be optional identifiers otherwise they will be mandatory identifiers O.4.3 Type Double Double Double Response for Difference of Means Test The following table provides details of the functions for generating response for difference of means test. 2440 O.4 Generating Continuous Response – O.4.3 Response for Difference of Means Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.12: Generating response for Mean of Paired Differences test (Contd.) Suggested format Format1 GenRespPairedDiff ← function(NumSub, Mean, SigmaD) { Error = 0 # Write the actual code here. # Store the generated difference of response values in an # array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(DiffResp = as.double(retval), ErrorCode = as.integer(Error))) } Format2 GenRespPairedDiff ← function(NumSub, Mean, SigmaD) { Error = 0 # Write the actual code here. # Store the generated Responses on control arm values in an # array called retval1. # Store the generated Responses on treatment arm values in an # array called retval2. # Use appropriate error handling and modify the # Error appropriately. return(list(RespC = as.double(retval1), RespT = as.double(retval2), ErrorCode = as.integer(Error))) } O.4 Generating Continuous Response – O.4.3 Response for Difference of Means Test2441 <<< Contents * Index >>> O R Functions Table O.13: Generating Response for Difference of Mean Test Suggested Name of the function Description Syntax Arguments GenRespDiffofMeans() Generates response values for Difference of Means test for a specified number of subjects. GenRespDiffofMeans (NumSub,TreatmentID, Mean, StdDev) Argument Description NumSub Number of Subjects TreatmentID Array specifying indexes of arms to which subjects are allocated (one arm index per subject). Index for placebo / control is 0. Mean Array (size 2) specifying mean response values for control (first element) and treatment (second element) arms StdDev Array (of size 2) specifying standard deviations for control (first element) and treatment (second element) arm. Return Value Type Suggested format 2442 R List The must identifiers in this list are Identifier Description Response An array of response for all subjects GenRespDiffofMeans ← function (NumSub,TreatmentID, Mean, StdDev) { Error = 0 # Write the actual code here. Store the generated continuous # response values in an array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(Response = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. Type Double O.4 Generating Continuous Response – O.4.3 Response for Difference of Means Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.4.4 Response for Mean of Paired Ratio Test Table O.14: Generating Response for Mean of Paired Ratio Test Suggested Name of the function Description Syntax Arguments GenRespPairedRatio() Generates response values for Mean of Paired Ratios Test for a specified number of subjects. GenRespPairedRatio(NumSub, Mean, StdDevLogRatio) Argument NumSub Mean StdDevLogRatio Return Value Type Description Number of Subjects Array (Size 2) specifying mean response values (i.e. means of corresponding Log Normal distribution) for Control (first element) and Treatment (second element)Arm. Array (Size 1) specifying Standard Deviation of Log of Ratio of Response of Treatment and Control. R List The must identifiers in this list are Identifier Description RatioResp An array of Ratio of generated response values on treatment and control arm. OR RespC An array of generated control response values for all subjects RespT An array of generated treatment response values for all subjects Note - If ”RatioResp” is found in output list then ”RespC” and ”RespT” will be optional identifiers otherwise they will be mandatory identifiers Type Double Double Double O.4 Generating Continuous Response – O.4.4 Response for Mean of Paired Ratio Test2443 <<< Contents * Index >>> O R Functions Suggested format 2444 Format1 GenRespPairedRatio ← function(NumSub, Mean, StdDevLogRatio) { Error = 0 # Write the actual code here. # Store the generated ratio of response values in an # array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(RatioResp = as.double(retval), ErrorCode = as.integer(Error))) } Format2 GenRespPairedRatio ← function(NumSub, Mean, StdDevLogRatio) { Error = 0 # Write the actual code here. # Store the generated Responses on control arm values in an # array called retval1. # Store the generated Responses on treatment arm values in an # array called retval2. # Use appropriate error handling and modify the # Error appropriately. return(list(RespC = as.double(retval1), RespT = as.double(retval2), ErrorCode = as.integer(Error))) } O.4 Generating Continuous Response – O.4.4 Response for Mean of Paired Ratio Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.4.5 Generating Response for Ratio of Means Test Table O.15: Generating response for Ratio of Means Test Suggested Name of the function Description Syntax Arguments GenRespRatioofMeans() Generates response values for Ratio of Means test for a specified number of subjects. GenRespRatioofMeans (NumSub,TreatmentID, Mean, CV) Argument Description NumSub Number of Subjects TreatmentID Array specifying indexes of arms to which subjects are allocated (one arm index per subject). Index for placebo / control is 0. Mean Array (size 2) specifying mean response values (i.e. means of corresponding Log Normal distribution) for control (first element) and treatment (second element) arms. CV Array (size 2) specifying Coefficient of Variation for control (first element) and treatment (second element) arm. Return Value Type R List The must identifiers in this list are Identifier Description Response An array of generated response (from Log Normal Distribution) Type Double O.4 Generating Continuous Response – O.4.5 Generating Response for Ratio of Means Test 2445 <<< Contents * Index >>> O R Functions Table O.16: Generating response for Ratio of Means Test (Contd) Suggested format O.4.6 GenRespRatioofMeans ← function(NumSub,TreatmentID, Mean, CV) { Error = 0 # Write the actual code here. Store the generated response # values in an array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(Response = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. Generating Binary Response Values The following table provides details of generating binary response values. Table O.17: Generating Binary Response Values Suggested Name of the function Description Syntax for one arm test Syntax for more than one arm test Syntax only for Trend in R Ordered Proportions 2446 GenBinResp() Generates Binary response (Two categories 0 (Non-Responder) and 1 (Responder) values for a specified number of subjects. GenBinResp (NumSub, PropResp) GenBinResp (NumSub, NumArm, TreatmentID, PropResp) GenTrendResp (NumSub, NumPop, PopulationID, PropResp) O.4 Generating Continuous Response – O.4.6 Generating Binary Response Values <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.18: Generating Binary Response Values (Contd) Arguments Argument NumSub NumArm Description Number of Subjects Number of arms in the trial (including placebo / control) TreatmentID Array specifying indexes of arms to which subjects are allocated (one arm index per subject). Index for placebo / control is 0. For other arms, indexes are consecutive positive numbers starting with 1. Thus if the trial has 4 arms (1 placebo + 3 treatment arms), arm indexes will be 0, 1, 2 and 3. PopulationID Array specifying indexes of populations to which subjects are allocated (one population index per subject). Index for first population is 0. For other populations, indexes are consecutive positive numbers starting with 1. Thus if the trial has 4 populations, their indices will be 0, 1, 2 and 3. PropResp An array specifying expected proportions of responders on each arm/Population. Return Value Type Suggested format R List - The must identifiers in this list are Identifier Description Response An array of generated Binary response for all subjects Type Double GenBinResp ← function(NumSub, NumArm, TreatmentID, PropResp) {Error = 0 # Write the actual code here. Store the generated binary response # values in an array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(Response = as.double(retval), ErrorCode = as.integer(Error))) } O.4 Generating Continuous Response – O.4.6 Generating Binary Response Values 2447 <<< Contents * Index >>> O R Functions O.4.7 Generating Categorical Response Values Table O.19: Generating Categorical Response Values Suggested Name of the function Description Syntax for one group test Syntax for more than one group test Arguments GenCatResp() Generates Categorical response values (0 to (Number of categories-1)) for a specified number of subjects. Binary response is a special case of this when number of categories is 2. GenCatResp(NumSub, NumCat, PropResp) GenCatResp (NumSub, NumGrp, GroupID, NumCat, PropResp) Argument NumSub NumGrp GroupID NumCat PropResp 2448 Description Number of Subjects Number of groups in the trial. Array specifying indices of groups to which subjects are allocated. Number of categories of response. 2-D array specifying expected proportions of responders in each category and on each group. PropResp[i, j] specifies expected proportion of responders in the jth category and on the ith group. O.4 Generating Continuous Response – O.4.7 Generating Categorical Response Values <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.20: Generating Categorical Response Values (Contd) Return Value Type Suggested format R List The must identifiers in this list are Identifier Description Type CatID An array of generated Double categorical response (0,1,2,...,(NumCat-1)) for all subjects. GenCatResp ← function(NumSub, NumGrp, GroupID, NumCat, PropResp) { Error = 0 # Write the actual code here. # Store the generated multinomial response values in an # array called retval. # Use appropriate error handling and modify the # Error appropriately. return(list(CatID = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. O.4 Generating Continuous Response – O.4.7 Generating Categorical Response Values2449 <<< Contents * Index >>> O R Functions O.4.8 Generating Survival Times Table O.21: Generating Survival Times (Time to Response) Suggested Name of the function Description Syntax for one arm test Syntax for more than one arm test Arguments GenSurvTime() Generates survival times for a specified number of subjects. GenSurvTime (NumSub, SurvMethod, NumPrd, PrdTime, SurvParam) GenSurvTime (NumSub, NumArm, TreatmentID, SurvMethod, NumPrd, PrdTime, SurvParam) Argument NumSub NumArm TreatmentID Description Number of Subjects Number of Arms in the trial. Array specifying indexes of arms to which subjects are allocated (one arm index per subject). Index for placebo / control is 0. For other arms, indexes are consecutive positive numbers starting with 1. SurvMethod Input method. 1 - Hazard rates. 2 - Cumulative % survival rates. 3 - Median Survival Times. NumPrd Number of survival periods. PrdTime Array of times used to specify survival parameters. If SurvMethod is 1, this array specifies the starting times of hazard pieces. If SurvMethod is 2, this array specifies the times at which the cumulative % survivals are specified. If SurvMethod is 3, the period time is 0 by default. 2450 O.4 Generating Continuous Response – O.4.8 Generating Survival Times <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.22: Generating Survival Times (Time to Response) (Contd) Arguments Argument SurvParam Return Value Type Suggested format Description 2-D array of parameters uses to generate time of events. If SurvMethod is 1, this array specifies arm by arm hazard rates (one rate per arm per piece). Thus SurvParam [i, j] specifies hazard rate in ith period for jth arm. If SurvMethod is 2, this array specifies arm by arm Cum % Survivals (one value per arm per piece). Thus SurvParam [i, j] specifies Cum % Survivals in ith period for jth arm. If SurvMethod is 3, this will be a 1 x 2 array with median survival times on each arms. R List - The must identifiers in this list are Identifier Description SurvivalTime An array of generated time to response values for each subject. Type Double GenSurvTime ← function(NumSub, NumArm, TreatmentID, SurvMethod, NumPrd, PrdTime, SurvParam) {Error = 0 If(SurvMethod == 1) { # Write the actual code for SurvMethod 1here. Store the generated survival times in an array called retval. } If(SurvMethod ==2) { # Write the actual code for SurvMethod 2here. # Store the generated survival times in an array called retval. } # Use appropriate error handling and modify the # Error appropriately. return(list(SurvivalTime = as.double(retval), ErrorCode = as.integer(Error))) } O.4 Generating Continuous Response – O.4.8 Generating Survival Times 2451 <<< Contents * Index >>> O R Functions O.5 Enhanced Simulations O.5.1 Input Arguments for One Look Test O.5.2 Input Arguments for Multi Look Test O.5.3 Output from R function User will provide an R function for computing test statistic as well as for performing test for the current look in current simulation. Name of this R function is not mandatory. O.5.1 Input Arguments for One Look Test This section describes input arguments for R function for one look test for computing test statistic or perform test for One Look as well as Multi Look tests. For One Look Test, R function will have following two mandatory named arguments 1. SimData - R Data frame which consists of data generated in current simulation (Case Data). This data frame will have headers indicating the names of the columns. These names will be same as those used in Data Generation. User should access the variables using headers for ex. SimData$ArrivalTime and not order. Optional outputs from Data Generation will also be available. 2. DesignParam - R List which consists of Design and Simulation Parameters which user may need to compute test statistic and perform test. User should access the variables using names for eg. DesignParam$SideType and not order. For details of this list please see below. 2452 O.5 Enhanced Simulations – O.5.1 Input Arguments for One Look Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.23: Input Table for One Look Test Argument Name Alpha Description Applicability Type I Error LowerAlpha Lower Type I Error UpperAlpha Upper Type I Error TrialType Type of the Trial Multi Look Enabled One Sided and Two Sided Symmetric Tests Multi Look Enabled Two Sided Asymmetric Tests Multi Look Enabled Two Sided Asymmetric All Tests TestType Type of Test All Tests TailType Nature of Critical Region One Sided Tests AllocInfo Array of the ratios of the treatment group sample sizes to control group sample size Multi Arms Tests Population Fractions Trend Test Codes 0 - Superiority 1 NonInferiority 2 - Equivalence 0 - One Side 1 - Two Sided 2 - Two Sided Asymmetric 0 - Left Tailed 1 - Right Tailed O.5 Enhanced Simulations – O.5.1 Input Arguments for One Look Test 2453 <<< Contents * Index >>> O R Functions Table O.24: Input Table for One Look Test (Contd) Argument Name CriticalPoint Description Critical value UpperCriticalPoint Upper Critical value LowerCriticalPoint Lower Critical value SampleSize MaxCompleters RespLag Sample Size Maximum Number of Completers Response Lag Based Applicability Single Look One Sided Tests Single Look Two Sided Tests Single Look Two Sided Tests All All Non Survival Tests All non survival tests All Survival Tests LookFixOption Time/Events Flag MaxEvents Maximum Events MaxStudyDur FollowUpType Maximum Study Duration Follow Up Type All Survival Event Based Tests All Survival Time Based Tests All Survival Tests FollowUpDur TestStatType Follow Up Duration Test Statistic Type All Survival Tests All Normal Tests All Tests 2454 Survival Ratio of Proportions NonInferiority Codes 0 - Event Based 1 - Time Based 0 - Until End of the Study 1 - For Fixed Period 3 - Z test 4 - t Test 0 Log Rank 1 - Wilcoxon Gehan 2 - Harrington Fleming 5 - Wald 6 - Score O.5 Enhanced Simulations – O.5.1 Input Arguments for One Look Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.25: Input Table for One Look Test (Contd) Argument Name HFParam VarType SigmaD SigmaLogRatio CoeffVar Sigma TrtEffNull Description Applicability Harrington Fleming Parameter Variance Type Survival Tests Standard Deviation of Paired Difference Standard Deviation of Log Ratio Coefficient of Variation Standard Deviation Treatment Effect under Null on natural scale UpperEquiLimit Upper Equivalence Limit on Natural Scale LowerEquiLimit Lower Equivalence Limit on Natural Scale EquiMargin Equivalence Margin MuC Mean for the Control Arm Codes t Test 4 - Equal 5 - UnEqual Diff of Prop Ratio of Prop 0 - Pooled 1 - UnPooled Single Prop Ratio of Proportions Score Mean of Paired Difference Z test Mean of Paired Ratios Z test Ratio of Means Z test All other Normal Z tests All Single Arms Test and Non-Inferiority Trials in Two Arms Tests All Continuous Tests with Equivalence Trial Type All Continuous Tests with Equivalence Trial Type Difference of Proportions Test with Equivalence Trial Type Multi Look Enabled Normal Two Arms Tests 2 - Null 3 - Empirical O.5 Enhanced Simulations – O.5.1 Input Arguments for One Look Test 2455 <<< Contents * Index >>> O R Functions Table O.26: Input Table for One Look Test (Contd) Argument Name PiC Description Applicability Proportion for Control Arm NumHzrdPrd Number of Hazard Pieces Array of Starting Value of Each Period Array of Control Hazard Rates for each period Test ID Multi Look Enabled Binary Two Arms Tests All Survival Tests All Survival Tests PrdAt LambdaC TestID 2456 Codes All Survival Tests Single Mean - 101 Mean of Paired Diff. - 105 Diff. of Means - 102 Single Prop. - 301 Diff. of Prop. - 303 Ratio of Prop. - 304 Ratio of Prop. FM - 305 Odds Ratio - 306 Survival Given study Durn. 401 Survival Given Accrual Rates 410 Ratio of Means test= 103 Mean of Paired Ratios= 106 Diff of Prop Equivalence = 309 Trend in R ordered Proportions = 310 Chisquare test for specified proportions in C categories = 201 Two Group Chi square for Proportions in C Categories = 202 Chi Square for Proportions in RxC tables= 203 Chi Square for Proportions in Rx2 tables = 314 O.5 Enhanced Simulations – O.5.1 Input Arguments for One Look Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.27: Input Table for One Look Test (Contd) Argument Name Scores Description Applicability Array of Scores NumPop Number of Populations NumGrp NumCat CatPropNull Number of Groups Number of Categories Array of category wise Proportions under Null Hypothesis Trend in R proportions Test Trend in R proportions Test Chi Square Tests Chi Square Tests Chi-Square for Specified Proportions in C Categories Test O.5.2 Codes Input Arguments for Multi Look Test For Multi Look Test, R function will have following three mandatory named arguments 1. SimData - Same as for One Look Test 2. DesignParam - Same as for One Look Test 3. LookInfo - R List which consists of Design and Simulation Parameters related to multi looks which user may need to compute test statistic and perform test. User should access the variables using names for ex. LookInfo$SideType and not order. For details of this list please see below. O.5 Enhanced Simulations – O.5.2 Input Arguments for Multi Look Test 2457 <<< Contents * Index >>> O R Functions Table O.28: Input Table for Multi Look Tests Argument Description Name NumLooks Number of Looks CurrLookIndex Current Look Index (1- Based) InfoFrac Array of Information Fraction CumAlpha Array of cumulative alpha spent CumAlphaUpperArray of Upper cumulative alpha spent CumAlphaLowerArray of Lower cumulative alpha spent CumCompleters Array of Cumulative Completers CumEvents Array of Cumulative Events LookTime Array of Look Times on Calendar Scale RejType Rejection Type 2458 Applicability Codes All Tests All Tests All Tests One Sided Tests Two Sided Tests Two Sided Tests All Non SurvivalTests All Survival Event Based tests All Survival Time Based tests All Tests 1 Sided Efficacy Upper = 0 1 Sided Futility Upper = 1 1 Sided Efficacy Lower = 2 1 Sided Futility Lower = 3 1 Sided Efficacy Upper Futility Lower = 4 1 Sided Efficacy Lower Futility Upper = 5 2 Sided Efficacy Only = 6 2 Sided Futility Only = 7 2 Sided Efficacy Futility = 8 Equivalence = 9 O.5 Enhanced Simulations – O.5.2 Input Arguments for Multi Look Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.29: Input Table for Multi Look Tests (Contd) Argument Name EffBdryScale EffBdry EffBdryUpper EffBdryLower FutBdryScale Description Applicability Codes Efficacy Boundary Scale Array of Efficacy Boundaries Array of Upper Efficacy Boundary Array of Lower Efficacy Boundary Futility Boundary Scale All Tests 0 - Z Scale 1 - p value scale CPDeltaOption Option of using Design or Estimated Delta for CP Computation FutBdry Array of Futility Boundaries FutBdryUpper Array of Upper Futility Boundary FutBdryLower Array of Lower Futility Boundary BindingType Binding Type One Sided Tests Two Sided Tests Two Sided Tests All Tests Tests with Futility Boundary on CP Scale 0 - Z scale 1 - p Value scale 2 - Delta Scale 3 - CP Scale 0 - Design Delta Option 1 - Estimated Delta Option One Sided Tests Two Sided Tests Two Sided Tests All Tests 0 - Non Binding 1 - Binding O.5 Enhanced Simulations – O.5.2 Input Arguments for Multi Look Test 2459 <<< Contents * Index >>> O R Functions O.5.3 Output from R function R function will return a list. The Identifier Names (Case Insensitive) and Type (we suggest user type casts the output) mentioned for outputs are compulsory while their order in the list is not. User can have additional outputs (scalars) in the list. If user wants to print scalars in the Simulation CSV file then he has to provide identifier for those scalars. These identifiers will be the columns names in output. Any repeated identifiers (column names) will be ignored. User can either return identifier ’Decision’ in which case other identifiers will become optional. If ’Decision’ is not returned then other identifiers will become mandatory. We suggest that the return List contain an identifier ”ErrorCode”. If specified, it has to be of Type Integer. Its values are classified as follows. 1. 0: No Error 2. Positive Integer: Non Fatal Error - Particular Simulation will be terminated but Next Simulation will be performed. 3. Negative Integer: Fatal Error - No further simulation will be attempted We suggest that user should classify error in these categories depending on the context. Table O.30: Output from R function (Decision Only) Identifier Decision 2460 Description Decision Code 0 - No Boundary Crossed 1 - Lower Efficacy Boundary Crossed 2 - Upper Efficacy Boundary Crossed 3 - Futility Boundary Crossed 4 - Equivalence Boundary Crossed Type Integer O.5 Enhanced Simulations – O.5.3 Output from R function Applicability All Tests <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.31: Output from R function (without ’Decision’) Identifier TestStat Description Value of appropriate Test Statistic. Regardless of the Efficacy or Futility Boundary Scale (ex. Delta or p Value or CP Scale) R function should return Test Statistic on Wald (Z) Scale Type Double Applicability All Tests except Equivalence Trial Type TestStatLeft TestStatRight Left and Right Test Statistic on Wald Scale Corresponding to Two Hypotheses Double Delta Estimate of Delta Double All test with Equivalence Trial Type Futility Boundary Scale is Delta or CP Endpoint is Binomial and FutBdryScale is CP and Delta option is estimated. Endpoint is Binomial and FutBdryScale is CP and delta options is estimated Endpoint is Binomial and FutBdryScale is CP and delta options is estimated CtrlCompleters Number of Completers on Control Arm Integer TrmtCompleters Number of Completers Treatment Arm Integer CtrlPi Proportion on Control Arm on Double O.5 Enhanced Simulations – O.5.3 Output from R function 2461 <<< Contents * Index >>> O R Functions O.6 Suggested Formats O.6.1 Test Stat for One Look O.6.2 Performing Test for One Look Tests O.6.3 Computing Test Statistic for Multi Look Tests O.6.4 Performing Test for Multi Look Tests O.6.1 Test Stat for One Look Suggested format for computing test statistic for one look tests is ComputeTestStat ← function(SimData, DesignParam) { Error = 0 # Write the actual code here. # Store the computed test statistic value in retval. # Use appropriate error handling and modify the Error appropriately. return(list(TestStat = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 O.6.2 Performing Test for One Look Tests Suggested format for performing test for one look tests is PerformDecision ← function(SimData, DesignParam) { Error = 0 # Write the actual code here. # compute the test statistic value and store the decision in retval. # Use appropriate error handling and modify the Error appropriately. return(list(Decision = as.integer(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 O.6.3 Computing Test Statistic for Multi Look Tests ComputeTestStat ← function(SimData, DesignParam, LookInfo) { Error = 0 # Write the actual code here. # Store the computed test statistic value in retval. # Use appropriate error handling and modify the Error appropriately. return(list(TestStat = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities 2462 O.6 Suggested Formats – O.6.3 Computing Test Statistic for Multi Look Tests <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 O.6.4 Performing Test for Multi Look Tests PerformDecision ← function(SimData, DesignParam, LookInfo) { Error = 0 # Write the actual code here. # Compute the test statistic value and store the decision # value (appropriate code) in retval. # Use appropriate error handling and modify the Error appropriately. return(list(Decision = as.integer(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 O.7 Basic Simulation O.7.1 Input Arguments for One Look Test O.7.2 Input Arguments for Multi Look Tests User can perform basic simulation in East6.1 using R function. This option will be available if user performs simulation for Difference of Means Z Test and generates data using Difference of Means option. In this case R function will directly generate test statistic. O.7.1 Input Arguments for One Look Test For One Look Test, R function for basic simulation will have only one mandatory named argument DesignParam - R List which consists of Design and Simulation Parameters which user may need to compute test statistic and perform test. User should access the variables using names for e.g. DesignParam$SideType and not order. O.7.2 Input Arguments for Multi Look Tests For Multi Look Test, R function will have following two mandatory named arguments 1. DesignParam - Same as for One Look Test 2. LookInfo - R List which consists of Design and Simulation Parameters related to multi looks which user may need to compute test statistic and perform test. User should access the variables using names for ex. LookInfo$SideType and not order. O.7 Basic Simulation – O.7.2 Input Arguments for Multi Look Tests 2463 <<< Contents * Index >>> O R Functions O.8 Output from R function R function for basic simulation will return a list. The Identifier Names (Case Insensitive) and Type (we suggest user type casts the output) mentioned for outputs are compulsory while their order in the list is not. User can have additional outputs (scalars) in the list. If user wants to print scalars in the Simulation CSV file then user has to provide identifier for those scalars. These identifiers will be the columns names in output. Any repeated identifiers (column names) will be ignored. The must identifier(s) in this list are Identifier Decision TestStat Description Decision Code 0 - No Boundary Crossed 1 - Lower Efficacy Boundary Crossed 2 - Upper Efficacy Boundary Crossed 3 - Futility Boundary Crossed 4 - Equivalence Boundary Crossed OR Test Statistic Value Type Integer Double We suggest that the return List contain an identifier ”ErrorCode”. If specified, it has to be of Type Integer. Its values are classified as follows 1. 0: No Error 2. Positive Integer: Non Fatal Error - Particular Simulation will be aborted but Next Simulation will be performed. 3. Negative Integer: Fatal Error - No further simulation will be attempted We suggest that user should classify error in these categories depending on the context. 2464 O.8 Output from R function <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.9 Suggested Formats O.9.1 Test Stat for One Look O.9.2 Performing Test for One Look Tests O.9.3 Test Statistic for Multi Look Tests O.9.4 Performing Test for Multi Look Tests O.9.1 Test Stat for One Look Suggested format for computing test statistic for one look tests is ComputeBasicTestStat ← function(DesignParam) { Error = 0 # Write the actual code here. # Store the computed test statistic value in retval. # Use appropriate error handling and modify the Error appropriately. return(list(TestStat = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 O.9.2 Performing Test for One Look Tests Suggested format for performing test for one look tests is PerformDecision ← function(DesignParam) { Error = 0 # Write the actual code here. # compute the test statistic value and store the decision in retval. # Use appropriate error handling and modify the Error appropriately. return(list(Decision = as.integer(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 O.9.3 Test Statistic for Multi Look Tests ComputeBasicTestStat ← function(DesignParam, LookInfo) { Error = 0 # Write the actual code here. # Store the computed test statistic value in retval. # Use appropriate error handling and modify the Error appropriately. return(list(TestStat = as.double(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities O.9 Suggested Formats – O.9.3 Test Statistic for Multi Look Tests 2465 <<< Contents * Index >>> O R Functions of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 O.9.4 Performing Test for Multi Look Tests PerformDecision ← function(DesignParam, LookInfo) { Error = 0 # Write the actual code here. # Compute the test statistic value and store the decision # value (appropriate code) in retval. # Use appropriate error handling and modify the Error appropriately. return(list(Decision = as.integer(retval), ErrorCode = as.integer(Error))) } Please note that ErrorCode is optional for this function. You can also return quantities of interest (scalar) (like estimates) in the output list. Provide identifiers for such outputs and they will be displayed in Output of East6.1 2466 O.9 Suggested Formats – O.9.4 Performing Test for Multi Look Tests <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.10 Treatment Selection Function Treatment Selection can be performed in combining p-values difference of means and difference of proportions designs using R. This section provides details on this functionality. This function will be called once in each simulation after first look if trial is not terminated. The use of error codes in this R function is similar to that explained in other R functions. This function has following inputs 1. SimData - R Data frame which consists of data generated in current simulation (Case Data). This data frame will have headers indicating the names of the columns. These names will be same as those used in Data Generation. User should access the variables using headers for ex. SimData$TreatmentID and not order. 2. DesignParam - R List which consists of Design parameters which user may need to perform treatment selection. User should access the variables using names for ex. DesignParam$SideType and not order. For details of this list please see appropriate table in this section 3. LookInfo - R List which consists of Design Parameters related to two looks which user may need to perform treatment selection. User should access the variables using names for ex. LookInfo$NumLooks and not order. For details of this list please see appropriate table in this section O.10 Treatment Selection Function 2467 <<< Contents * Index >>> O R Functions Table O.32: Function for treatment selection Suggested Name of the function Description Syntax Arguments TreatmentSelection() Performs treatment selection for combining p-values designs. This function is called once in each simulation after first look. TreatmentSelection(SimData, DesignParam, LookInfo) Compulsory Argument SimData DesignParam LookInfo Return Value Type Suggested format and additional information 2468 Description Simulated Data Parameters of design Look-wise information R List The must identifiers in this list are Identifier Description TreatmentID An array of treatment identifiers AllocRatio An array of allocation ratios TreatmentSelection ← function(SimData, DesignParam, LookInfo) { Error = 0 # Write the actual code here. # TreatmentID must contain values 1, 2, . . . (N o.of T reatment − 1) # Allocation ratios are with respect to control # East expects TreatmentIDs sorted according to preference of treatment selection # Use appropriate error handling and modify the # Error appropriately. return(list(TreatmentID = as.integer(retval1), AllocRatio = as.double(retval2), ErrorCode = as.integer(Error))) } O.10 Treatment Selection Function Type Integer. Double. <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.33: DesignParam for Treatment Selection Argument Name Alpha Trial Type Description Type I Error Type of the trial Taile Type Nature of critical region SampleSize TestStatType Total Sample Size Test Statistic Type VarType Variance Type MultAdjMethod Multiplicity adjustment method PValCombMethod P-Value Combination Method Sigma Common Standard deviation or standard deviation array w1 Weight for stage 1 w2 Weight for stage 2 TestID Test ID NumTreatments Codes 0 - Superiority 1 - Non-inferiority 2 - Equivalence 0 - Left tailed 1 - Right Tailed 3 - Z-stat 4 - t-Stat 4 - equal 5 - Un-equal 0 - Pooled 1 - Un-pooled 0 - Bonferonni 1 - Sidak 2 - Simes 3 - Dunnett 0 - Inverse Normal 418 - DOM 419 - DOP Number of treatments including control O.10 Treatment Selection Function 2469 <<< Contents * Index >>> O R Functions Table O.34: LookInfo for Treatment Selection Argument Name NumLooks CurrLookIndex InfoFrac EffBdry FutBdryScale 2470 Description Number of Looks Current Look Index (1based) Array of Information fractions Array of Efficacy Boundaries Futility Boundary Scale O.10 Treatment Selection Function Codes 1 - p-value scale 2 - Delta/Sigma Scale (DOM) or Delta Scale (DOP) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.11 Functions for Adaptive Simulations This section describes details of various R functions for adaptive simulations. R function can be used for performing sample size re-estimation. R function can be used along with CHW or CDL simulation but not with Muller and Schafer simulation. R function assumes that Promising Zone scale is ’Conditional Power’. For Survival endpoint, R function can be used to re-estimate events. Whereas for Normal and Binary endpoints, R function can be used to re-estimate completers. Even with R function, East will not allow reduction of planned events or completers and will not allow exceeding maximum feasible number of events or completers. R function can also be used for computing cumulative Wald statistic in adaptive survival simulations. O.11 Functions for Adaptive Simulations 2471 <<< Contents * Index >>> O R Functions Table O.35: Function for Re-estimating events Suggested Name of the function Description Syntax Arguments PerformSSR() Performs re-estimation of events at adapt look in survival simulation. PerformSSR(OrigCP, CPmin, CPmax, DesEvents) Compulsory Argument OrigCP CPmin CPmax DesEvents Return Value Type Suggested format and additional information 2472 Description CP computed with design number of events Minimum CP threshold for promising zone Minimum CP threshold for promising zone Design Number of Events R List The must identifiers in this list are Identifier Description ReEstEvents Re-estimated events PerformSSR ← function(OrigCP, CPmin, CPmax, DesEvents) { Error = 0 # Write the actual code here. # Use appropriate error handling and modify the # Error appropriately. return(list(ReEstEvents = as.integer(retval1), ErrorCode = as.integer(Error))) } O.11 Functions for Adaptive Simulations Type Integer. <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.36: Function for Re-estimating Completers Suggested Name of the function Description Syntax Arguments PerformSSR() Performs re-estimation of completers at adapt look in Normal and Binary adaptive simulations. PerformSSR(OrigCP, CPmin, CPmax, DesCompleters) Compulsory Argument OrigCP Description CP computed with design number of completers CPmin Minimum CP threshold for promising zone CPmax Minimum CP threshold for promising zone DesCompleters Design Number of Completers Return Value Type Suggested format and additional information R List The must identifiers in this list are Identifier Description ReEstCompleters Re-estimated completers Type Integer. PerformSSR ← function(OrigCP, CPmin, CPmax, DesCompleters) { Error = 0 # Write the actual code here. # Use appropriate error handling and modify the # Error appropriately. return(list(ReEstCompleters = as.integer(retval1), ErrorCode = as.integer(Error))) } O.11 Functions for Adaptive Simulations 2473 <<< Contents * Index >>> O R Functions Table O.37: Computing cumulative Wald Statistic in Survival adaptive Simulations Suggested Name of the function Description Syntax CumWaldAdapt() Computes cumulative Wald statistics at each look in CHW or CDL survival simulations. CumWaldAdapt(SimData, DesignParam, LookInfo, AdaptParam) Arguments Compulsory Argument SimData DesignParam LookInfo AdaptParam Return Value Type Description Simulated Data Design Parameters Look-wise Information Adaptive Parameters R List The must identifiers in this list are Identifier Description CumWaldStatistic Cumulative Wald Statistic CumEvents Cumulative Events Type Double. Integer. Optional identifiers in this list are Identifier LookTime CumSampleSize CumEventsCtrl CumEventsTrmt AvgFollowUp AccrualDuration 2474 Description Look Time for each Look Cumulative sample size at each Look Cumulative Events on Control Arm at each Look Cumulative Events on Treatment Arm at each Look Average Follow up at each Look Accrual Duration at each Look O.11 Functions for Adaptive Simulations Type Double. Integer. Integer Integer Double Double <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table O.38: Computing cumulative Wald Statistic in Survival adaptive Simulations (Continued) Suggested format and additional information CumWaldAdapt ← function(SimData, DesignParam, LookInfo, AdaptParam) { Error = 0 # Write the actual code here. # Use appropriate error handling and modify the # Error appropriately. return(list(CumWaldStatistic = as.double(retval1), CumEvents = as.integer(retval2), ErrorCode = as.integer(Error))) } O.11 Functions for Adaptive Simulations 2475 <<< Contents * Index >>> O R Functions O.12 Use of Initialization Function O.12.1 Setting Seed O.12.2 Setting Working Directory O.12.3 Initialize Global Variable This appendix provides more information on Init(Seed) function. This function will be optional. If provided, this function will be executed before executing any of the other user defined functions. User can use this function for various reasons. Below we list some of these. O.12.1 Setting Seed If user wants repeatability of the results for a run of simulations, he can set the seed using set.seed command inside this function. He can also choose the Random Number Generator as well as the method for Normal method generation. The default random number generator is ”Mersenne-Twister” in R. Example 1 Default random number generator will be used. Init(Seed) { Error = 0 set.seed(seed = Seed) return(as.integer(Error)) } Example2 Wichmann Hill random number generator will be used. Init(Seed) { Error = 0 set.seed(seed = Seed, kind = ”Wichmann-Hill”) return(as.integer(Error)) } O.12.2 Setting Working Directory User can set the working directory. User may want to source the files he intends to use. Example 1 Init(Seed) { Error = 0 setwd(”E:\\Work\\East6.1”) source(’ConstantsFile.R’) return(as.integer(Error)) } 2476 O.12 Use of Initialization Function – O.12.3 Initialize Global Variable <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 O.12.3 Initialize Global Variable User can initialize the global variables which may be used by his other R functions Example 1 Init(Seed) { Error = 0 Tolerance ¡¡- 1e-6 NoIntervals ¡¡- 3 return(as.integer(Error)) } O.13 Additional Arguments Suppose for a user defined function f, the mandatory named arguments are Arg1 and Arg2. This function will be called as follows f(Arg1 = Val1, Arg2 = Val2) where Val1 and Val2 will be appropriately passed. Now user can have additional arguments for this function f, for example suppose he has additional arguments Arg3 and Arg4. The syntax for this function is f ← function(Arg1, Arg3, Arg2, Arg4) { Body of the function } Note that in the call to this function; only appropriate values will be passed to mandatory named arguments hence it is important that user initializes the other arguments. Some of the ways to do this are Initialize in the Definition f ← function(Arg1, Arg3 = 2, Arg2, Arg4 = 5) { Body of the function } Initialize using Global Variables initialized in Init function. f ← function(Arg1, Arg3 = Tolerance, Arg2, Arg4 = NoIntervals) { Body of the function } O.13 Additional Arguments 2477 <<< Contents * Index >>> P P.1 East 5.x to East 6.4 Import Utility Import capabilities This document serves the purpose of providing a step-by-step procedure as well as describing the scope of the East 5.x to East 6.4 Import Utility provided by Cytel to the East 6.4 Users. The Utility has been developed with a view to facilitate importing and converting the workbooks created in the earlier versions of East, namely the Microsoft Excel based East 5.x to the new architect based version of East namely, East 6.4. With the help of this Utility provided in the All Program menu, the East 6.4 user can now import the older workbooks and continue working on the imported designs for further development. For example, monitoring the design at subsequent interim looks or simulating the design is possible within the East 6.4 environment. In order to open a workbook with the .es5 extension given by East 5.x version, it must first be converted to a file with the .cywx extension that will be recognized by East 6.4. From the Start Menu select: All Programs→ Cytel Architect → East 6.4 → Convert Old Workbook We can see the following window which accepts East5.x workbook as input and outputs a workbook of East 6. Click the Browse buttons to choose the East 5.x file to be converted and the file to be saved with .cywx extension of East 6 version. To start the conversion click Convert Workbook: 2478 P.1 Import capabilities <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The default location for the converted East 6 workbook will be the same as that of the old workbook. You may select a different location of your choice for saving the same. While the conversion is in process, you will see a detailed log being displayed about the progress of the workbook creation. After completion of the conversion, you can save the log at the location of your choice. Once complete, the file can be opened as a workbook in East 6.4 as usual as shown below: P.1 Import capabilities 2479 <<< Contents P * Index >>> East 5.x to East 6.4 Import Utility When user imports an East 5.x workbook into East 6.4, East 6.4 will retain the input parameters and re-compute all output and make it available in the 6.4 (.cywx) format. Since there have been major computational improvements from earlier versions of East to this version, some results may not match with those computed in East 5.x. In some rare situations, East 6.4 will give a message that the input parameters are too extreme and it won’t be able to import the workbooks. In general, user should be able to import any workbook created in East 5.x using any supported version of Excel into East 6.4. The list includes workbooks containing single look designs, group sequential designs, interim monitoring sheets, simulations etc. All supported locales will work including English (US/UK), French, Spanish, Japanese etc. However, there are some exceptions to the Convert Old Workbook functionality. These are described below: 1. East 6.4 will not support importing of the following: Direct monitoring, Basic simulations, Enhanced simulations with information scale, Adaptive worksheets for two-sided tests, expected sample size under H1/2, graph sheets and scratch sheets, interim monitoring sheets for single look designs. 2. Adaptive worksheets (like CHW simulations) for the odds ratio (OR) test from East 5.0 will not be imported into East 6.4 as East 6.4 does not have adaptive features for this test yet. If user tries to import, East 6.4 will display the following message: ”CHW simulations are not available for this test in this version of East.” 3. East 5.x allowed user to input floating point sample size / events value while 2480 P.1 Import capabilities <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 computing power of a design. If it is a group sequential design, East 6.4 uses the option ”Do not round sample size/events” to deal with the specified floating point value. However, in case of some designs which are necessarily fixed look designs only, such as ratio of means, crossover designs, difference of means designs using t statistic etc, the option of using floating point input is not amenable by East 6.4. For such designs, East 6.4 will round down the sample size to the nearest integer for computing the power. 4. East 6.4 won’t import group sequential designs from East 5.x for the following tests: Linear Regression, Single Slope Linear Regression for Comparing Two Slopes Repeated Measures for Comparing Two Slopes If user tries to import, East will display the following message: ”Group sequential option is not available for this test in this version of East.” East 6.4 supports only fixed sample (single look) designs for these tests. 5. East 6.4 will not import East 5.x designs of the following type as these are not available in East 6.4: Logistic regression Cox proportional hazards regression If user tries to import, East will display the following message: ”Unable to convert workbook as this test is not implemented in this version.” 6. East 6.4 will not import East 5.x designs with spending functions of type ”Power Family” as these spending functions are not available in East 6.4. If user tries to import, East will display the following message: ”Power family spending function is not supported in this version.” 7. Definition of treatment effect and effect size has been changed from East 5.x to East 6.4 in the following cases: In these cases, corresponding changes will be observed in the workbook after importing. 8. Muller and Schafer adaptive simulations performed with SWACI method in East 5.x workbooks will be run with BWCI method of estimation instead of SWACI P.1 Import capabilities 2481 <<< Contents P * Index >>> East 5.x to East 6.4 Import Utility Table P.1: Treatment effect in non-inferiority trials Test Difference of Means for Independent Data Difference of Proportion for Independent Data Odds Ratio of Proportion for Independent Data East 5.x East 6.4 δ = µc − µt δ = µt − µc δ = πc − πt δ = πt − πc ψ= πc (1 − πt ) πt (1 − πc ) ψ= πt (1 − πc ) πc (1 − πt ) Table P.2: Longrank Test Test Effect Size in Logrank Test East 5.x East 6.4 δ = − ln λt λc δ = ln λt λc while importing to East 6.4. This is because East 6.4 has replaced the SWACI method with the BWCI method as the latter is more advanced. 9. East 6.4 will not import exact paired difference design from East 5.x as this design is not yet available in East 6.4 . The East 5.x design is for the exact unconditional test for matched pairs whereas the design in East 6.4 is for the exact McNemar’s test which is a conditional test. If user tries to import the East 5.x design, East 6.4 will display the following message: ”The exact unconditional test for matched pairs is not available in the current version of East. This workbook cannot be imported.” 10. While importing survival designs from East 5.x, East 6.4 will convert input method to hazard rates if the East 5.x design was created with any other input 2482 P.1 Import capabilities <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 method. 11. In case of Logrank test with accrual rates and accrual duration, East first computes a range for the target accrual and when user specifies the committed accrual, East computes the study duration and other outputs. Because of computational improvements from East 5.x to East 6.4, the target accrual range in East 6.4 could be a little different for the same design compared to East 5.x. If user has an East 5.x workbook where the committed accrual is equal or very close to the minimum, this workbook may not be imported in East 6.4 as specified committed accrual may be less than the minimum accrual computed by East 6.4. 12. East 6.4 will not import an East 5.x workbook if its file name contains the single quote (’) character. For technical support, please call us on 617-661-2011 or send a fax on 617-661-4405, or send email to support@cytel.com. Visit our website www.cytel.com for more information. P.1 Import capabilities 2483 <<< Contents * Index >>> Q Q.1 Introduction Technical Reference and Formulas: Single Look Designs In this Appendix, we provide theory used in the computation of single look designs in East and formulas used for computing sample size N (total number of subjects on the treatment arm in case of single arm studies, total number of pairs of subjects included in the study in paired designs and total number of subjects on the treatment and control arms both in case of two sample studies). We begin with introducing common notations. The general method of computing sample size is solving the power equation for ’N’ given other parameters such as δ, α, σ 2 . In a few cases, the procedure resorts to a closed form formula for the sample size. In rest of the cases, such a closed form expression for sample size is not possible. As a result, it requires use of an iterative method for computing the sample size for given power, starting with a sensible initial solution for N. In this Appendix, we describe the closed form solution wherever possible and in other cases state the initial solution for N along with the power equation used to derive the solution for N. Q.2 2484 Common Notation Below we give notation which will be used throughout this chapter. Common Notation µ: Unknown mean of a single population µ0 : Mean response under Null hypothesis S: Sample standard deviation X̄: Sample Mean D: Difference variable of treatment and control when the response is continuous D̄: Sample Mean of D σD : Population standard deviation of D SD : Sample standard deviation of D λ: Median of the difference variable µt : Unknown mean of treatment group µc : Unknown mean of control group σ: Population standard deviation δ: Effect size , for example, difference of means, difference of proportions, log hazard ratio etc SE: Standard Error δ0 : Non-inferiority margin for difference ρ0 : Non-inferiority margin for ratio δL : Lower equivalence limit for difference δU : Upper equivalence limit for difference ρL : Lower equivalence limit for ratio Q.2 Common Notation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ρU : Upper equivalence limit for ratio φ (x): density function of standard normal variable, evaluated at x Φ (x): Distribution function of standard normal variable, evaluated at x Zα : Upper α percent point of standard normal distribution τν (x): Distribution function of a student’s t distribution, with ν degrees of freedom evaluated at x τν (x|Ω): Distribution function of a non-central t distribution with ν degrees of freedom and non-centrality parameter Ω, evaluated at x tα,ν : Upper α percent point of a student’s t-distribution with ν degrees of freedom Q.3 Sample Size : Continuous Q.3.1 Single:Sup:Normal Q.3.2 Single:Sup:t Q.3.3 Paired:Diff:Sup: Normal Q.3.4 Paired:Diff:Sup:t Q.3.5 Paired:Diff:Noninf: Normal Q.3.6 Paired:Diff: Noniinf:t Q.3.7 Paired:Diff:Equiv:t Q.3.8 Paired:Ratios: Sup:Normal Q.3.9 Paired:Ratios:Sup:t Q.3.10 Paired:Ratios: Noninf:Normal Q.3.11 Paired:Ratios: Noninf:t Q.3.12 Paired:Ratios: Equiv:t Q.3.1 Single Arm Design : Single Mean : Superiority: Test Statistic Distribution: Normal σ = µ − µ0 One sided (for both δ > 0 and δ < 0) N= σ2 2 (Zα + Zβ ) δ2 √ ! |δ| N P ower = 1 − Φ Zα − σ Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= σ2 (Zα/2 + Zβ )2 δ2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! δ N P ower = 1 − Φ Zα/2 − σ Q.3 Sample Size:Continuous – Q.3.1 Single:Sup:Normal 2485 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs √ ! δ N +Φ −Zα/2 − σ Two sided asymmetric (both δ > 0 and δ < 0) Start with the initial solution as N= σ2 (Zα/2 + Zβ )2 δ2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! δ N P ower = 1 − Φ Zαu − σ √ ! δ N +Φ −Z αl − σ Q.3.2 Single Arm Design : Single Mean : Superiority: Test Statistic Distribution: t δ = µ − µ0 One sided (both δ > 0 and δ < 0) Start with the initial solution as N= σ2 Zα 2 (Zα + Zβ )2 + 2 δ 2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! |δ| N P ower = 1 − τN −1 tα,N −1 σ Two sided symmetric (both δ > 0 and δ < 0) Start with the initial solution as N= 2486 σ2 Zα 2 2 (Z + Z ) + α β δ2 2 Q.3 Sample Size:Continuous – Q.3.2 Single:Sup:t <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! δ N P ower = 1 − τN −1 t α2 ,N −1 σ + τN −1 −t α 2 ,N −1 √ ! δ N σ Q.3.3 Paired Design: Superiority: Test Statistic Distribution: Normal:Mean of paired differences δ = µt − µc One sided (both δ > 0 and δ < 0) N= σD 2 (Zα + Zβ )2 δ2 √ ! |δ| N P ower = 1 − Φ Zα − σD Two sided symmetric (both δ > 0 and δ < 0) Start with the initial solution as N= σD 2 (Zα/2 +Zβ )2 δ2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! δ N P ower = 1 − Φ Z α2 − σD √ ! δ N + Φ −Z α2 − σD Q.3 Sample Size:Continuous – Q.3.3 Paired:Diff:Sup: Normal 2487 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Two sided asymmetric (both δ > 0 and δ < 0) Start with the initial solution as σD 2 (Zα/2 +Zβ )2 δ2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! δ N P ower = 1 − Φ Zαµ − σD N= + Φ −Zαl Q.3.4 √ ! δ N − σD Paired Design: Superiority: Test Statistic Distribution: t δ = µt − µc One sided (both δ > 0 and δ < 0) Start with the initial solution as σ2 Zα 2 (Zα + Zβ )2 + 2 δ 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! |δ| N P ower = 1 − τN −1 tα,N −1 σD N= Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as σ2 Zα 2 (Zα + Zβ )2 + 2 δ 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! δ N P ower = 1 − τN −1 t α2 ,N −1 σ N= 2488 Q.3 Sample Size:Continuous – Q.3.4 Paired:Diff:Sup:t <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 + τN −1 Q.3.5 −t α2 ,N −1 √ ! δ N σ Paired Design : Non-inferiority: Test Statistic Distribution: Normal δ = µt − µc One sided (for both δ > δ0 and δ < δ0 ) N= σD 2 (Zα + Zβ )2 (δ − δ0 ) 2 √ ! |δ − δ0 | N P ower = 1 − Φ Zα − σD Two sided symmetric(for both δ > δ0 and δ < δ0 ) Start with the initial solution as N= σD 2 (Zα/2 + Zβ )2 (δ − δ0 ) 2 and solve using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! (δ − δ0 ) N (δ − δ0 ) N P ower = 1 − Φ Zα/2 − + Φ −Zα/2 − σD σD Two sided asymmetric(for both δ > δ0 and δ < δ0 ) Start with the initial solution as N= σD 2 (Zα/2 + Zβ )2 (δ − δ0 ) 2 and solve using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! (δ − δ0 ) N (δ − δ0 ) N P ower = 1 − Φ Zαu − + Φ −Zαl − σD σD Q.3 Sample Size:Continuous – Q.3.6 Paired:Diff: Noniinf:t 2489 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Q.3.6 Paired Design : Non-inferiority: Test Statistic Distribution: t δ = µt − µc One sided (for both δ > 0 and δ < 0) Start with the initial solution as N= σ2 2 (δ − δ0 ) 2 (Zα + Zβ ) + Zα2 2 and solve using an iterative procedure following equation so that the computed power matches with the desired powerwith 1.e-6 precision. √ ! |δ − δ0 | N P ower = 1 − τN −1 tα,N −1 σD Q.3.7 Paired Design : Equivalence: Test Statistic Distribution: t δ = µt − µc Solve using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! (δ − δL ) N (δ − δU ) N P ower = 1−τN −1 tα,N −1 +τN −1 −tα,N −1 σD σD Q.3.8 Paired Design: Superiority: Test Statistic Distribution: Normal: Mean of Paired Ratios δ = ln µt µc One sided (for both δ > 0 and δ < 0) N= 2490 σD 2 (Zα + Zβ )2 δ2 Q.3 Sample Size:Continuous – Q.3.8 Paired:Ratios: Sup:Normal <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 √ ! |δ| N P ower = 1 − Φ Zα − σD Where σD = standard deviation of log ratios Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= σD 2 (Zα/2 + Zβ )2 δ2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! δ N δ N P ower = 1 − Φ Zα/2 − + Φ −Zα/2 − σD σD Where σD = standard deviation of log ratios Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= σD 2 (Zα/2 + Zβ )2 δ2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! δ N δ N + Φ −Zαl − P ower = 1 − Φ Zαu − σD σD Where σD = standard deviation of log ratios. Q.3.9 Paired Design: Superiority: Test Statistic Distribution: t Mean of paired ratios: δ = ln µt µc One sided (for both δ > 0 and δ < 0) Start with the initial solution as N= σ2 Z 2α (Zα + Zβ )2 + 2 δ 2 Q.3 Sample Size:Continuous – Q.3.9 Paired:Ratios:Sup:t 2491 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! |δ| N P ower = 1 − τN −1 tα,N −1 σD where σD = standard deviation of log ratios Two sided symmetric (for both δ > 0 and δ < 0). Start with the initial solution as N= σ2 Z 2α (Zα + Zβ )2 + 2 δ 2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! δ N δ N P ower = 1 − τN −1 t α2 ,N −1 + τN −1 − t α2 ,N −1 σD σD where σD = standard deviation of log ratios. Q.3.10 Paired Design : Non-inferiority: Test Statistic Distribution: Normal δ = ln µt µc One sided (for both δ > δ0 and δ < δ0 ) N= σD 2 (Zα + Zβ )2 (δ − δ0 ) 2 √ ! |δ − δ0 | N P ower = 1 − Φ Zα − σD where δ0 = log(ρ0 ) and σD = standard deviation of log ratios Q.3.11 Paired Design : Non-inferiority: Test Statistic Distribution: t δ = ln 2492 µt µc Q.3 Sample Size:Continuous – Q.3.11 Paired:Ratios: Noninf:t <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One sided (for both δ > 0 and δ < 0) Start with the initial solution as N= σ2 2 (δ − δ0 ) 2 (Zα + Zβ ) + Zα2 2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! |δ − δ0 | N P ower = 1 − τN −1 tα,N −1 σD where δ0 = ln(ρ0 ) and σD = standard deviation of log ratios Q.3.12 Paired Design : Equivalence: Test Statistic Distribution: t δ = ln µt µc Solve using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! (δ − δU ) N (δ − δL ) N +τN −1 −tα,N −1 P ower = 1−τN −1 tα,N −1 σD σD where δ0 = ln(ρ0 ) and σD = standard deviation of log ratios Q.3 Sample Size:Continuous – Q.4.12 Paired:Ratios: Equiv:t 2493 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Q.4 Sample Size : Continuous:Two Samples Q.4.1 Diff:Sup:Normal Q.4.2 Diff:Sup:t:Var Equal Q.4.3 Diff:Sup:t:Var Unequal Q.4.4 Diff:Noninf:Normal Q.4.5 Diff:Noninf:t:Var Equal Q.4.6 Diff:Noninf:t:Var Unequal Q.4.7 Diff:Equiv:t Q.4.8 Ratios:Sup:Normal Q.4.9 Ratios:Sup:t:Var Equal Q.4.10 Ratios:Noninf:Normal Q.4.11 Ratios: Noninf:t Q.4.12 Ratios:Equiv:t Q.4.13 Wilcoxon Mann Whitney Test Q.4.1 Two Independent Samples:Superiority:Test Statistic Dist: Normal δ = µt − µc , T F = Nt , σ = common s.d. N One sided (for both δ > 0 and δ < 0) σ 2 (Zα + Zβ )2 δ ∗T F ∗ (1 − T F ) ! p δ N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα − σ N= 2 Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= σ 2 (Z α2 + Zβ )2 δ 2 ∗T F ∗ (1 − T F ) and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα/2 − σ +Φ − Zα/2 − δ p N ∗ T F ∗ (1 − T F ) σ ! Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= σ 2 (Z α2 + Zβ )2 δ 2 ∗T F ∗ (1 − T F ) and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zαu − σ 2494 Q.4 Sample Size:Continuous:Two Sample – Q.4.1 Diff:Sup:Normal <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 +Φ −Z αl − δ p N ∗ T F ∗ (1 − T F ) σ ! Q.4.2 Two Independent Samples: Superiority: Test Statistic Distribution: t: Variance : Equal δ = µt − µc , T F = Nt , σ = common s.d. N One sided (for both δ > 0 and δ < 0) Start with the initial solution as N= σ2 T F 2 Zα 2 (Zα + Zβ )2 + − 1) 2 σ 2 (T F and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ| N (T F − 1) P ower = 1 − τnt + nc −2 tα,nt + nc −2 σ ∗ TF Two sided (for both δ > 0 and δ < 0) Start with the initial solution as N= Zα/2 2 σ2 T F 2 2 (Z + Z ) + β α/2 σ 2 (T F − 1) 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ| N (T F − 1) P ower = 1 − τnt +nc −2 t α2 ,nt +nc −2 + σ ∗ TF τnt +nc −2 − t α2 ,nt +nc −2 |δ| p N (T F − 1) σ ∗ TF ! Q.4 Sample Size:Continuous:Two Sample – Q.4.3 Diff:Sup:t:Var Unequal 2495 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Q.4.3 Two Independent Samples: Superiority: Test Statistic Distribution: t Variance : Unequal Nt , N σt and σc are s.d.s for treatment and control respectively δ = µt − µc , T F = One sided (for both δ > 0 and δ < 0) Start with a relevant initial solution and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − τν tα,ν q 2 σt σc2 + nt nc where the d.f.v. are given by: 2 σc2 nc ) σ2 ν= ( ntt + σ2 2 ( nt ) t nt −1 σ2 2 + ( ncc ) nc −1 Q.4.4 Two Independent Samples: Non-inferiority : Test Statistic Distribution: Normal One sided (for both δ > δ0 and δ < δ0 ) 2 N= σ 2 (Zα + Zβ ) 2 (δ − δ0 ) ∗ T F ∗ (1 − T F ) Q.4.5 Two Independent Samples: Non-inferiority : Test Statistic Distribution: t Variance : Equal One sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= 2496 σ2 T F 2 2 (δ − δ0 ) (T F − 1) (Zα + Zβ )2 + Zα2 2 Q.4 Sample Size:Continuous:Two Sample – Q.4.5 Diff:Noninf:t:Var Equal <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. P ower = 1 − τnt + nc −2 tα,nt + nc −2 ! p |δ − δ0 | N (T F − 1) σ ∗ TF Q.4.6 Two Independent Samples: Non-inferiority : Test Statistic Distribution: t: Variance : Unequal One sided (for both δ > 0 and δ < 0)) Start with a relevant initial solution and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ − δ0 | P ower = 1 − τν tα,ν q 2 σt σc2 nt + nc where d.f.is given by: ν= 2 σt nt σt2 σc2 nt + nc 2 nt −1 Q.4.7 + 2 2 σc nc 2 nc −1 Two Independent Samples: Equivalence:Test Statistic Distribution:t δ = µt − µc Solve using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ − δL | N (T F − 1) P ower = 1 − τnt +nc −2 tα,nt +nc −2 + σ ∗ TF τnt +nc −2−1 −tα,nt +nc −2 ! p |δ − δU | N (T F − 1) σ ∗ TF Q.4 Sample Size:Continuous:Two Sample – Q.4.8 Ratios:Sup:Normal 2497 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Q.4.8 Two Independent Samples: Superiority: Test Statistic Distribution: Normal: Variance: Equal Nt n cv = Coefficient of variation of the original data is the input. δ = ln(µt /µc ), T F = σ = common standard deviation of log ratios = p ln(C V 2 ) + 1 One sided (for both δ > 0 and δ < 0)) 2 N= σ 2 (Zα + Zβ ) δ 2 ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα − |δ| p N ∗ T F ∗ (1 − T F ) σ ! Two sided symmetric (for both δ > 0 and δ < 0)) Start with the initial solution as 2 σ 2 (Zα/2 + Zβ ) N= 2 δ ∗ T F ∗ (1 − T F ) and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα/2 − σ ! p N ∗ T F ∗ (1 − T F ) +Φ −Zα/2 − σ δ Two sided asymmetric (for both δ > 0 and δ < 0)) Start with the initial solution as 2 N= 2498 σ 2 (Zα/2 + Zβ ) δ 2 ∗ T F ∗ (1 − T F ) Q.4 Sample Size:Continuous:Two Sample – Q.4.8 Ratios:Sup:Normal <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zαu − σ +Φ − Zαl − δ p N ∗ T F ∗ (1 − T F ) σ ! Q.4.9 Two Independent Samples: Superiority: Test Statistic Distribution: t:Variance : Equal Nt N CV = Coefficient of variation of the original data is the input. δ = ln(µt /µc ), T F = σ = common standard deviation of log ratios = p ln(C V 2 ) + 1 One sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= σ2 T F 2 Z 2α (Zα + Zβ )2 + − 1) 2 δ 2 (T F and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ| N (T F − 1) P ower = 1 − τnt + nt −2 tα,nt + nt −2 σ ∗ TF Two sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= σ2 T F 2 Z 2 α/2 (Zα/2 + Zβ )2 + − 1) 2 δ 2 (T F and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ N (T F − 1) P ower = 1 − τnt +nc −2 tα/2,nt +nc −2 + σ ∗ TF Q.4 Sample Size:Continuous:Two Sample – Q.4.9 Ratios:Sup:t:Var Equal 2499 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs τnt +nc −2 −tα/2,nt +nc −2 δ p N (T F − 1) σ ∗ TF ! Q.4.10 Two Independent Samples: Non-inferiority : Test Statistic Distribution: Normal Nt N CV = Coefficient of variation of the original data is the input. δ = ln(µt /µc ), T F = σ = common standard deviation of log ratios = p ln(C V 2 ) + 1 One sided (for both δ > δ0 and δ < δ0 ) 2 N= σ 2 (Zα + Zβ ) 2 (δ − δ0 ) ∗ T F ∗ (1 − T F ) where δ0 = ln(ρ0 ) and σ = standard deviation of log ratios Q.4.11 Two Independent Samples: Non-inferiority : Test Statistic Distribution: t Nt N CV = Coefficient of variation of the original data is the input. δ = ln(µt /µc ), T F = σ = common standard deviation of log ratios = p ln(CV 2 ) + 1 One sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= σ2 T F 2 2 (δ − δ0 ) (1 − T F ) (Zα + Zβ )2 + Zα 2 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ − δ0 | N (1 − T F ) P ower = 1 − τnt +nt −2 tα,nt +nt −2 σ ∗ TF where δ0 = ln(ρ0 ) 2500 Q.4 Sample Size:Continuous:Two Sample – Q.4.12 Ratios:Equiv:t <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Q.4.12 Two Independent Samples: Equivalence : Test Statistic Distribution: t Nt N CV = Coefficient of variation of the original data is the input. δ = ln(µt /µc ), T F = σ = common standard deviation of log ratios = p ln(CV 2 ) + 1 Solve using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ − δL | N (1 − T F ) P ower = 1 − τnt +nc −2 tα,nt +nc −2 + σ ∗ TF τnt +nc −2−1 Q.4.13 −tα,nt +nc −2 ! p |δ − δU | N (1 − T F ) σ ∗ TF Two Independent Samples: Wilcoxon Mann Whitney Test x1 , x2 , ...., xnc observations from Control x1 , x2 , ...., xnt observations from Treatment r = nNt θ = treatement effect Test Statistic U1 = R1 − nc (nc +1) ∼ AN (µU , µ2U ) 2 where R1= Sum of ranks of control population in the combined sample c +nt +1) µU = nc2nt and µ2U = nc nt (n12 One sided H0 : θ = 0 against H1 : θ > 0; Y observations tend to be larger than X observations Sample Size 2 (Zα + Zβ ) N= 2 12r(1 − r) (p − 0.5) Q.4 Sample Size:Continuous:Two Sample – Q.4.13 Wilcoxon Mann Whitney Test 2501 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs where µ µ p = P (X < Y ) = Φ( t√−2σ c ) assuming that the observations come from Normal distributions with common standard deviation σ. Two sided H0 : θ = 0 against H1 : θ 6= 0 Sample Size 2 N= Q.5 Sample Size : Continuous : Crossover Designs : Two Samples Q.5.1 Q.5.2 Q.5.3 Q.5.4 Q.5.5 Q.5.6 Crossover:Sup:t Crossover:Noninf:t Crossover: Equiv:t Crossover:Sup:t Crossover: Noninf:t Crossover:Equiv:t Q.5.1 (Zα/2 + Zβ ) 12r(1 − r)(p − 0.5) 2 Crossover Designs :Superiority : Test Statistic Distribution: t √ Nt ,σ = M SE N √ √ = 2 M SE = s.d. of difference of treatment effects δ = µt − µc ,T F = σD One sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= σ2 T F 2 Zα 2 2 (Z + Z ) + α β δ 2 (T F − 1) 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ| 2N (1 − T F ) P ower = 1 − τnt +nc −2 tα,nt +nc −2 σ ∗ TF Two sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= Zα/2 2 σ2 T F 2 2 (Z + Z ) + β α/2 2δ 2 (1 − T F ) 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ| 2N (1 − T F ) P ower = 1 − τnt +nc −2 t α2 ,nt +nc −2 + σ ∗ TF 2502 Q.5 Continuous:Crossover Designs – Q.5.1 Crossover:Sup:t <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 τnt +nc −2 Q.5.2 −t α2 ,nt +nc −2 |δ| p 2N (1 − T F ) σ ∗ TF ! Crossover Designs :Noninferiority : Test Statistic Distribution:t √ t M SE δ = µt − µc , T F = N N ,σ = √ √ σD = 2 M SE = s.d. of difference of treatment effects One sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= σ2 T F 2 2 2(δ − δ0 ) (1 − T F ) (Zα + Zβ )2 + Zα 2 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ − δ0 | 2N (1 − T F ) P ower = 1 − τnt +nc −2 tα,nt +nc −2 σ ∗ TF Q.5.3 Crossover Designs :Equivalence : Test Statistic Distribution: t √ t δ = µt − µc , T F = N M SE N ,σ = √ √ σD = 2 M SE = s.d. of difference of treatment effects. Solve using an iterative procedure using the following equation, so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ − δL | 2N (1 − T F ) P ower = 1 − τnt +nc −2 tα,nt +nc −2 + σ ∗ TF τnt +nc −2 Q.5.4 −tα,nt +nc −2 ! p |δ − δU | N (1 − T F ) σ ∗ TF Crossover Designs: Superiority: Test Statistic Distribution: t Nt δ = ln(µt/µc ), T F = , N Q.5 Continuous:Crossover Designs – Q.5.4 Crossover:Sup:t 2503 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs σ= p M SE log One sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= Zα 2 σ2 T F 2 2 (Z + Z ) + α β 2δ 2 (1 − T F ) 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ| 2N (1 − T F ) P ower = 1 − τnt +nc −2 tα,nt +nc −2 σ ∗ TF Two sided (for both δ > 0 and δ < 0)) Start the initial solution as N= σ2 T F 2 Zα/2 2 2 ( +Z ) + Z β α/2 2δ 2 (T F − 1) 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ 2N (1 − T F ) + P ower = 1 − τnt +nc −2 t α2 ,nt +nc −2 σ ∗ TF τnt +nc −2 Q.5.5 −t δ α 2 ,nt +nc −2 p 2N (1 − T F ) σ ∗ TF Crossover Designs :Noninferiority : Test Statistic Distribution: t Nt δ = ln(µt/µc ), T F = , N p σ = M SE log 2504 ! Q.5 Continuous:Crossover Designs – Q.5.5 Crossover: Noninf:t <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One sided (for both δ > 0 and δ < 0)) Start with the initial solution as N= σ2 T F 2 2 2(δ − δ0 ) (1 − T F ) (Zα + Zβ )2 + Zα 2 2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ − δ0 | 2N (1 − T F ) P ower = 1 − τnt +nc −2 tα,nt +nc −2 σ ∗ TF Where δ0 = ln (ρ0 ) Q.5.6 Crossover Designs :Equivalence : Test Statistic Distribution: t Nt δ = ln(µt/µc ),T F = N p σ = M SE log Solve using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. ! p |δ − δL | 2N (1 − T F ) P ower = 1 − τnt +nc −2 tα,nt +nc −2 σ ∗ TF +τnt +nc −2−1 −tα,nt +nc −2 ! p |δ − δU | N (1 − T F ) σ ∗ TF where δL = ln(ρL ) and δU = ln(ρU ). Q.5 Continuous:Crossover Designs – Q.6.6 Crossover:Equiv:t 2505 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Q.6 Sample Size : Continuous : Many Samples Q.6.1 One Way ANOVA Q.6.2 One Way Contrast:t Q.6.3 One Way Repeated : ANOVA Q.6.4 One Way Repeated Measures Contrast Q.6.5 Two Way ANOVA Q.6.6 Linear regression single slope Q.6.7 Linear Regression: Diff. of slopes Q.6.8 Repeated measures: Diff. of slopes Q.6.1 One Way ANOVA : Superiority: Test Statistic Distribution: F σ = Common standard deviation 2 σm = Variance of means r = Number of groups. Solve for n using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. P ower = Pλ (F > F1,(r−1)(n−r),α ) with non-centrality parameter λ= Q.6.2 2 n σm σ2 One Way ANOVA : Single One Way Contrast: t σ = Common standard deviation 2 σmc = Variance of means r = Number of groups One sided Solve for n using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. P ower = Pλ1 (t > tn−r,α ) with non-centrality parameter λ1 = √ σmc n σ Two Sided Solve for n using an iterative procedure the following equation, so that the computed power matches with the desired power with 1.e-6 precision. P ower = Pλ (F > F1,(n−r),α ) 2506 Q.6 Sample Size :Continuous:Many Samples – Q.6.2 One Way Contrast:t <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 with non-centrality parameter λ= 2 n σmc σ2 Q.6.3 One Way Repeated Measures: ANOVA: Superiority: Constant Correlation M=number of levels µi = mean at level I σ = standard deviation at each level ρ = between level correlation P µi −µ)2 2 σm = (M = variance of means σ2 m Effective size = ∆ = σ2 (1−ρ) P ower = Pλ (F > F(M −1),(M −1)(n−1),α ) with noncentrality parameter λ = nM ∆ Q.6.4 One Way Repeated Measures Contrast M=number of levels µi = mean at level i σ = standard deviation at each level ρ = between level P correlation P Contrast C = Ci µi such that Ci = 0 pP 2 D= Ci √ Effective size = ∆ = σD|C| 1−ρ P ower = Pλ (F > F1,(M −1)(n−1),α ) with noncentrality parameter λ = n ∆2 Q.6.5 Two Way ANOVA r = number of factor A levels, Q.6 Sample Size :Continuous:Many Samples – Q.6.5 Two Way ANOVA 2507 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs s = number of factor B levels, µ = overall mean σ = common s.d. in each of the groups µi = mean across factor A levels for factor A level i µj = mean across factor B levels for factor B level j µij = mean for factor A level i and factor B level j VA = Variance of the marginal means for factor A VA = P i (µi −µ)2 r VB = Variance of the marginal means for factor B P VA = j (µj −µ)2 r VAB = Variance of cell means for factor A and B P P VAB = i j (µij − µi − µj +µ)2 rs P owerA = P (F > F(r−1),rs,(n−1),α ) with non-centrality parameter λ = nrs VσA2 P owerB = P (F > F(s−1),rs,(n−1),α ) with non-centrality parameter λ = nrs VσB2 P owerAB = P (F > F(r−1)(s−1),rs,(n−1),α ) with non-centrality parameter λ = nrs VAB σ2 Q.6.6 Linear regression single slope Use δ = θ − θ0 and σ = sample size . s.d.of residual s.d.of X = σξ σx throughout the computation of power, One sided (for both δ > 0 and δ < 0) N= σ2 2 (Zα + Zβ ) δ2 √ ! δ N P ower = 1 − Φ Zα − σ 2508 Q.6 Sample Size :Continuous:Many Samples – Q.6.6 Linear regression single slope <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as σ2 (Zα/2 + Zβ )2 δ2 and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. √ ! √ ! δ N δ N + Φ −Zα/2 − P ower = 1 − Φ Zα/2 − σ σ N= Q.6.7 Linear Regression : Difference of slopes t σ Use δ = θt − θc , T F = N N ,σ = e where σxc = Std dev of X under control σxt = Std dev of X under treatment σe = Std dev of residuals q 2 +(1−T F )∗σ 2 (1−T F )∗σxc xt 2 σ2 σxc xt One sided (for both δ > 0 and δ < 0) 2 σ 2 (Zα + Zβ ) δ 2 ∗ T F ∗ (1 − T F ) ! p |δ| N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα − σ N= Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 N= σ 2 (Zα/2 + Zβ ) δ 2 ∗ T F ∗ (1 − T F ) and solve using an iterative procedure following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα/2 − σ ! p δ N ∗ T F ∗ (1 − T F ) +Φ −Zα/2 − σ Q.6 Sample Size :Continuous:Many Samples – Q.6.8 Repeated measures: Diff. of slopes2509 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Q.6.8 Repeated measures: Difference of slopes q 2 −1) σw t σb2 + 12(M Use δ = θt − θc , T F = N , σ = N M (M −1) S 2 throughout the computation of power, sample size, alpha and delta. Where M = Number of measurements S = Duration of follow up σw = Within subject std. dev σb = Between subject std. dev σe = Std dev of residuals One sided (for both δ > 0 and δ < 0) 2 σ 2 (Zα + Zβ ) δ 2 ∗ T F ∗ (1 − T F ) ! p |δ| N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα − σ N= Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 N= σ 2 (Zα/2 + Zβ ) δ 2 ∗ T F ∗ (1 − T F ) and solve using an iterative procedure using the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! p δ N ∗ T F ∗ (1 − T F ) P ower = 1 − Φ Zα/2 − σ ! p N ∗ T F ∗ (1 − T F ) +Φ −Zα/2 − σ δ 2510 Q.6 Sample Size :Continuous:Many Samples – Q.7.8 Repeated measures: Diff. of slopes <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Q.7 Sample Size : Discrete Q.7.1 Single Prop: Sup:Null Q.7.2 Single Prop: Sup:Empirical Q.7.3 Paired:Sup: McNemar Q.7.1 Single Arm Design : Single Proportion : Superiority: Test Statistic Distribution: Normal:Variance: Under Null hypothesis s δ = π1 −π0 , ∆ = π0 (1 − π0 ) π1 (1 − π1 ) One sided (for both δ > 0 and δ < 0) π1 (1 − π1 ) 2 (Zβ +∆ Zα ) δ2 ! ! √ |δ| N ∆ P ower = 1 − Φ Zα − p π0 (1 − π0 ) N= Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= π1 (1 − π1 ) (Zβ + ∆Zα/2 )2 δ2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! √ |δ| N P ower = 1 − Φ (Zα/2 − p )∆ π0 (1 − π0 ) ! √ |δ| N + Φ (−Zα/2 − p )∆ π0 (1 − π0 ) Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= π1 (1 − π1 ) (Zβ + ∆Zα/2 )2 δ2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! √ |δ| N P ower = 1 − Φ (Zau − p )∆ π0 (1 − π0 ) Q.7 Sample Size : Discrete – Q.7.1 Single Prop: Sup:Null 2511 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs √ |δ| N ! )∆ +Φ ( − Zal − p π0 (1 − π0 ) Q.7.2 Single Arm Design : Single Proportion : Superiority: Test Statistic Distribution: Normal:Variance: Empirical s δ = π1 −π0 , ∆ = π0 (1 − π0 ) π1 (1 − π1 ) One sided (for both δ > 0 and δ < 0) π1 (1 − π1 ) 2 (Zβ + Zα ) δ2 ! √ |δ| N P ower = 1 − Φ Zα − p π1 (1 − π1 ) N= Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= π1 (1 − π1 ) (Zβ + Zα/2 )2 δ2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! √ |δ| N P ower = 1 − Φ Zα/2 − p π1 (1 − π1 ) +Φ −Zα/2 − p √ |δ| N ! π1 (1 − π1 ) Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= 2512 π1 (1 − π1 ) (Zβ + Zα/2 )2 δ2 Q.7 Sample Size : Discrete – Q.7.2 Single Prop: Sup:Empirical <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. ! √ |δ| N P ower = 1 − Φ Zau − p π1 (1 − π1 ) ! √ |δ| N +Φ −Zal − p π1 (1 − π1 ) Q.7.3 Paired Design: McNemar’s Test: Superiority: Test Statistic Distribution: Normal δ = µt − µc Control No Response Response Total Prob Experimental π00 π01 π10 π11 πt 1 − πt Total Prob 1 − πc πc 1 ξˆ = π b01 + π b10 One sided (for both δ > 0 and δ < 0) 2 2 N= [ξb − (b π 01 − π b10 ) ] 2 (b π 01 − π b10 ) 2 (Zβ + Zα ) √ |π̂01 − π̂10 | N P ower = 1 − Φ Zα − q 2 ξˆ − (π̂01 − π̂10 ) Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 2 N= [ξˆ − (π̂01 − π̂10 ) ] (π̂01 − π̂10 ) 2 (Zβ + Zα/2 )2 Q.7 Sample Size : Discrete – Q.7.3 Paired:Sup: McNemar 2513 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. √ |π̂ − π̂ | N 01 10 P ower = 1 − Φ Zα/2 − q 2 ˆ ξ − (π̂01 − π̂10 ) √ |π̂01 − π̂10 | N + Φ −Zα/2 − q 2 ξˆ − (π̂01 − π̂10 ) Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 2 N= [ξˆ − (π̂01 − π̂10 ) ] (π̂01 − π̂10 ) 2 (Zβ + Zα/2 )2 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. √ |π̂01 − π̂10 | N P ower = 1 − Φ Zau − q 2 ξˆ − (π̂01 − π̂10 ) √ N |π̂ − π̂ | 01 10 + Φ −Zal − q 2 ˆ ξ − (π̂01 − π̂10 ) 2514 Q.7 Sample Size : Discrete – Q.8.3 Paired:Sup: McNemar <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Q.8 Sample Size :Discrete : Two Samples Q.8.1 Diff:Sup:Unpooled Q.8.2 Diff:Sup:Pooled Q.8.3 Diff:Noninf Q.8.4 Diff:Equiv Q.8.5 Ratios:Sup:Unpooled Q.8.6 Ratios:Sup:Pooled Q.8.7 Ratios:Noninf:FM Q.8.8 Ratios:Noninf:Wald Q.8.9 Odds Ratio:Sup Q.8.10 Odds Ratio:noninf Q.8.11 Common Odds Ratio:Sup Q.8.1 Two Independent Samples : Difference of Proportions: Superiority: Test Statistic Distribution: Normal:Variance:Unpooled estimate δ = µt − µc Z=q π bt − π bc π bt (1−b πt ) nt + π bc (1−b πc ) nc One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q |δ| π̂t (1−π̂t ) nt + π̂c (1−π̂c ) nc 2 N= (Zα + Zβ ) [(1 − T F ) ∗ π bt (1 − π bt ) + T F ∗ π bc (1 − π bc )] δ 2 (1 − T F ) Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= (Zα/2 + Zβ )2 [(1 − T F ) ∗ π̂t (1 − π̂t ) + T F ∗ π̂c (1 − π̂c ] δ 2 (1 − T F ) and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − Φ Zα/2 − q π̂t (1−π̂t ) π̂c (1−π̂c ) + nt nc + Φ −Zα/2 − q |δ| π̂t (1−π̂t ) nt + π̂c (1−π̂c ) nc Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as N= (Zα/2 + Zβ )2 [(1 − T F ) ∗ π̂t (1 − π̂t ) + T F ∗ π̂c (1 − π̂c )] δ 2 (1 − T F ) Q.8 Sample Size :Discrete:Two Samples – Q.8.1 Diff:Sup:Unpooled 2515 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − Φ Zau − q π̂c (1−π̂c ) π̂t (1−π̂t ) + nt nc |δ| + Φ −Zal − q π̂t (1−π̂t ) π̂c (1−π̂c ) + nt nc Q.8.2 Two Independent Samples : Difference of Proportions: Superiority: Test Statistic Distribution: Normal: Variance : Pooled estimate δ = µt − µc π b= nt π b t + nt π bt N π̂t − π̂c Z=q π̂(1 − π̂)( n1t + 1 nc ) One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q |δ| π b(1 − π b)( n1t + 1 nc ) 2 N= (Zα + Zβ ) π b(1 − π b) δ 2 T F (1 − T F ) Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 N= (Zα/2 + Zβ ) π̂(1 − π̂) δ 2 T F (1 − T F ) and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − Φ Zα/2 − q π̂(1 − π̂)( n1t + n1c ) 2516 Q.8 Sample Size :Discrete:Two Samples – Q.8.2 Diff:Sup:Pooled <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 |δ| + Φ −Zα/2 − q π̂(1 − π̂)( n1t + 1 nc ) Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 (Zα/2 + Zβ ) π̂(1 − π̂) N= δ 2 T F (1 − T F ) and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − Φ Zαu − q 1 1 π̂(1 − π̂)( nt + nc ) + Φ −Zαl − q |δ| π̂(1 − π̂)( n1t + 1 nc ) Casagrande-Pike-Smith Correction The Casagranda-Pike -Smith correction is applicable to Difference of Proportions Superiority and Noninferiority. The correction is applicable in the case of equal allocation ratio only. For the Alternative hypothesis H1 : πt > πc the corrected formula for sample size is 2 q A 1 + 1 + 4(πtA−πc ) nt = nc = 2 4(πt − πc ) where i2 h p p A = Z1−α 2π(1 − π) + Zβ πt (1 − πt ) + πc (1 − πc ) where π= πt + πc 2 Q.8.3 Two Independent Samples : Difference of Proportions: Noninferiority: Test Statistic Distribution: Normal δ = µt − µc Q.8 Sample Size :Discrete:Two Samples – Q.8.3 Diff:Noninf 2517 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Z=q π bt − π b c − δ0 π bt (1−b πt ) nt + π bc (1−b πc ) nc One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q |δ − δ0 | π bt (1−b πt ) nt + π bc (1−b πc ) nc 2 N= (Zα + Zβ ) [(1 − T F ) ∗ π bt (1 − π bt ) + T F ∗ π bc (1 − π bc )] 2 (δ − δ0 ) (1 − T F ) Q.8.4 Two Independent Samples : Difference of Proportions: Equivalence: Test Statistic Distribution: Z Effect Size: δ = πt − πc , δ1 = Expected effect size, δ0 = Equivalence Margin, r = nNt H0 : |πt − πc | = δ0 against H1 : |πt − πc | < δ0 > 0 Compute Sample Size 2 N= (Zα + Zβ ) (δ0 − δ1 ) 2 πc (1 + πc ) (πc − δ1 )(1 − (πc + δ1 )) + 1−r r Q.8.5 Two Independent Samples : Ratio of Proportions: Superiority: Test Statistic Distribution: Normal :Variance: Unpooled δ = µt − µc , T F = Nt N ln(b π t ) − ln(b πc ) Z=q (1−b πt ) (1−b πc ) nt π b t + nc π bc 2518 Q.8 Sample Size :Discrete:Two Samples – Q.8.5 Ratios:Sup:Unpooled <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q 2 (Zα + Zβ ) N= δ2 |δ| (1−b πt ) nt π bt + (1−b πc ) nc π bc 1−π bt 1−π bc + TF π bt (1 − T F ) π bc Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 (Zα/2 + Zβ ) 1 − π̂t 1 − π̂c N= + δ2 T F π̂t (1 − T F )π̂c and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| + P ower = 1 − Φ Z α2 − q (1−π̂c ) (1−π̂t ) + nt π̂t nc π̂c Φ −Z α2 − q |δ| (1−b πt ) nt π bt + (1−b πc ) nc π bc Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 (Zα/2 + Zβ ) 1 − π̂t 1 − π̂c N= + δ2 T F π̂t (1 − T F )π̂c and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| |δ| +Φ − Zal − q P ower = 1−Φ Zau − q (1−π̂t ) (1−π̂c ) (1−π̂t ) nt π̂t + nc π̂c nt π̂t + (1−π̂c ) nc π̂c Q.8.6 Two Independent Samples : Ratio of Proportions: Superiority: Test Statistic Distribution: Normal: Variance: Pooled δ = µt − µc , T F = Nt N Q.8 Sample Size :Discrete:Two Samples – Q.8.6 Ratios:Sup:Pooled 2519 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs nt π b t + nt π bt N π b= ln(b π t ) − ln(b πc ) Z=r (1−b π) 1 1 + nt nc π b One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q |δ| (1−b π) 1 π b ( nt + 1 nc ) 2 N= (Zα + Zβ ) (1 − π b) 2 π δ T F (1 − T F )b Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 N= (Zα/2 + Zβ ) (1 − π̂) δ 2 T F (1 − T F )π̂ and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − Φ Zα/2 − q (1−π̂) 1 1 π̂ ( nt + nc ) + Φ −Zα/2 − q |δ| (1−π̂) 1 π̂ ( nt + 1 nc ) Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 N= 2520 (Zα/2 + Zβ ) π̂(1 − π̂) δ 2 T F (1 − T F ) Q.8 Sample Size :Discrete:Two Samples – Q.8.6 Ratios:Sup:Pooled <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − Φ Zau − q (1−π̂) 1 1 π̂ ( nt + nc ) |δ| + Φ − Z al − q (1−π̂) 1 1 + ) ( π̂ nt nc Q.8.7 Two Independent Samples : Ratio of Proportions: Noninferiority: Farrington and Manning: Test Statistic Distribution: Normal δ = πt − ρ0 πc , T F = Nt ρ N , 0 = Noninferiority margin Z=q π b t − ρ0 π bt π bt (1−b πt ) nt + ρ0 2 π bc (1−b πc ) nc λ = nnct θ = λ1 a=1+θ b = −[ρ0 (1 − θ πc ) + θ + πt ] c = ρ0 (θ π√c + πt ). b2 −4ac and π = π t π t = −b− 2a c ρ0 h p i2 p Zα [(ρ0 2 /θ)π̄c (1 − π̄c ) + π̄t (1 − π̄t )] + Zβ [(ρ0 2 /λ)πc (1 − πc ) + πt (1 − πt ) Nt ≥ δ2 " # p √ |δ| ∗ Nt − Zα [(ρ0 2 /θ)π̄c (1 − π c ) + π̄t (1 − π t )] p P ower = Φ [(ρ0 2 /θ)πc (1 − πc ) + πt (1 − πt )] Q.8.8 Two Independent Samples : Ratio of Proportions: Noninferiority: Wald’s Test: Test Statistic Distribution: Normal δ = ln(πt/πc ), T F = Nt ρ N , 0 = Noninferiority margin Z= ln(b π t ) − ln(b π c ) − ln(ρ0 ) q (1−b πt ) (1−b πc ) nt π b t + nc π bc Q.8 Sample Size :Discrete:Two Samples – Q.8.8 Ratios:Noninf:Wald 2521 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q |δ − ln(ρ0 )| (1−b πt ) nt π bt + (1−b πc ) nc π bc Q.8.9 Two Independent Samples : Odds Ratio of Proportions: Superiority: Test Statistic Distribution: Normal δ = ln ln Z=q π bt (1 − π bc ) π bc (1 − π bt ) π bt (1−b πc ) π bc (1−b πt ) 1 nt π bt (1−b πt ) + 1 nc π bc (1−b πc ) One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q N= |δ| 1 nt π̂ t (1−π̂ t ) + 1 nc π̂ c (1−π̂ c ) 2 (Zα + Zβ ) 1 δ 2 ( T F π̂t (1−π̂t ) + (1−T F ) π̂1 c (1−π̂c ) ) Two sided symmetric (for both δ > 0 and δ < 0) 2 (Zα/2 + Zβ ) 1 2 δ ( T F π̂t (1−π̂t ) + (1−T F )π̂1c (1−π̂c ) ) |δ| + P ower = 1 − Φ Zα/2 − q 1 1 + nt π̂t (1−π̂t ) nc π̂c (1−π̂c ) N= Φ − Zα/2 − q 2522 |δ| 1 nt π̂t (1−π̂t ) + 1 nc π̂c (1−π̂c ) Q.8 Sample Size :Discrete:Two Samples – Q.8.9 Odds Ratio:Sup <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as 2 N= (Zα/2 + Zβ ) 1 2 δ ( T F π̂t (1−π̂t ) + (1−T F )π̂1c (1−π̂c ) ) and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| + P ower = 1 − Φ Zau − q 1 1 nt π̂t (1−π̂t ) + nc π̂c (1−π̂c ) Φ −Zal − q |δ| 1 nt π̂t (1−π̂t ) 1 nc π̂c (1−π̂c ) + Q.8.10 Two Independent Samples : Odds Ratio of Proportions: Noninferiority: Test Statistic Distribution: Normal δ = ln Z= πt (1−πc ) πc (1−πt ) ln q π̂t (1−π̂c ) π̂c (1−π̂t ) = ln(ψ); ψ0 = noninferiority margin for odds ratio −ln(ψ0 ) 1 1 + n π̂ (1−π̂ nt π̂t (1−π̂t ) c c c) One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − q |δ − ln(ψ0 )| 1 nt π̂t (1−π̂t ) + 1 nc π̂c (1−π̂c ) 2 (Zα + Zβ ) N= (δ − ln(ψ0 )) 2 1 T F π̂t (1−π̂t ) + 1 (1−T F )π̂c (1−π̂c ) Q.8.11 Two Independent Samples : Common Odds Ratio for Stratified 2 × 2 tables: Superiority: Test Statistic Distribution: Normal G = Total number of strata δ = G−1 G X g=1 ln( π btg π btg ) − ln( ) (1 − π btg ) (1 − π btg ) Q.8 Sample Size :Discrete:Two Samples – Q.8.11 Common Odds Ratio:Sup 2523 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs G−1 G P π̂ g=1 Z=s G−1 G P g=1 π̂ tg tg {ln( (1−π̂ ) − ln( (1−π̂ )} tg ) tg ) 1 + { ntg π̂tg (1−π̂ tg ) 1 ncg π̂cg (1−π̂cg ) } where π̂tg and π̂cg are the sample proportions based on ntg and ncg observations seen in the treatment and control arms respectively of the g th stratum. One sided (for both δ > 0 and δ < 0) P ower = 1 − Φ Zα − s |δ| G P G−1 g=1 1 + { ntg π̂tg (1−π̂ tg ) 1 ncg π̂cg (1−π̂cg ) } 2 (Zα + Zβ ) N= δ2 G−1 G P g=1 ! 1 + { ntg π̂tg (1−π̂ tg ) 1 ncg π̂cg (1−π̂cg ) } Two sided symmetric (for both δ > 0 and δ < 0) Start with the initial solution (Zα/2 + Zβ ) N= δ2 G−1 G P g=1 2 ! 1 { ntg π̂tg (1−π̂ tg ) + 1 ncg π̂cg (1−π̂cg ) } and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. P ower = 1 − Φ Zα/2 − s |δ| G−1 G P g=1 1 { ntg π̂tg (1−π̂ + tg ) 1 ncg π̂cg (1−π̂cg ) } + Φ −Zα/2 − s |δ| G−1 G P g=1 2524 1 { ntg π̂tg (1−π̂ + tg ) 1 ncg π̂cg (1−π̂cg ) } Q.8 Sample Size :Discrete:Two Samples – Q.8.11 Common Odds Ratio:Sup <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Two sided asymmetric (for both δ > 0 and δ < 0) Start with the initial solution as (Zα/2 + Zβ ) N= δ2 G−1 G P g=1 2 ! 1 { ntg π̂tg (1−π̂ tg ) + 1 ncg π̂cg (1−π̂cg ) } and solve using an iterative procedure the following equation so that the computed power matches with the desired power with 1.e-6 precision. |δ| P ower = 1 − Φ Zαu − s G P 1 G−1 { ntg π̂tg (1−π̂ + tg ) g=1 1 ncg π̂cg (1−π̂cg ) } |δ| Φ −Zαl − s G P 1 G−1 + { ntg π̂tg (1−π̂ tg ) g=1 Q.9 Sample Size :Discrete : Many Samples Q.9.1 Single Arm:Chisquare Q.9.2 Two Group Chisquare Q.9.3 Wilcoxon Rank Sum Q.9.4 Multi-arm: Trend Test Q.9.5 Multi-arm:Chisquare for Rx2 Q.9.6 Multi-arm:Chisquare:RxC + 1 ncg π̂cg (1−π̂cg ) } Q.9.1 Many Samples: Single Arm: Chi-square for specified proportions in C categories C= number of categories Proportions under Ho : {p0i ; i = 1, 2, 3, ....., c} Proportions under H1 : {p1i ; i = 1, 2, 3, ....., c} Effect size 2 ∆ = c 2 X (p0i − p1i ) i=1 p0i Test statistic χ2c−1 = N ∆2 Compute power Find χ2c−1,α such that P (χ2c−1 > χ2c−1,α ) = α from central Chisquare with c-1 Q.9 Discrete:Many Samples – Q.9.1 Single Arm:Chi-square 2525 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs degrees of freedom. P ower = P λ (χ2 > χ2c−1,α ) where P ower = P λ (χ2 > χ2c−1,α ) is a non-central chi square variable with c-1 degree of freedom and non-centrality parameter λ. λ = N ∆2 Compute sample size N is determined using iterative method so that the power is maintained. If user has given allocation {ri ; i = 1, 2, 3, ....., c} N is divided into {N i ; i = 1, 2, 3, ....., c}. These N 0 i s are rounded up to nearest integers and added up to get actual N. Q.9.2 Many Samples: Parallel Design: Two group Chi-square for proportions in C categories nt = sample size on treatment arm nc = sample size on control arm Proportions for treatment : {π tj ; j = 1, 2, 3, ....., c} Proportions for control : {π cj ; j = 1, 2, 3, ....., c} Effect size: c 2 X (πtj − πcj ) 2 Q Q = (1 − ) ∆ 1 1 (π cj (1 − Q1 ) + πtj Q1 ) i=1 nt N nt nt + nc Where Q1 = = Noncentrality parameter λ λ = N ∆2 Compute Power Find χ2c−1,α such that P (χ2c−1 > χ2c−1,α ) = α from central Chisquare with c-1 degrees of freedom. P ower = P λ (χ2 > x2c−1,α ) where χ2 is a non-central chi square variable with c-1 degree of freedom and non-centrality parameter . Compute Sample Size For given power, N is determined using iterative method. Q.9.3 Many Samples: Parallel Design: Wilcoxon Rank Sum for ordered categorical data 2526 Q.9 Discrete:Many Samples – Q.9.3 Wilcoxon Rank Sum <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 {π tj ; j = 1, 2, , ......, c} proportions for category j for treatment, j=1,2,...,J {π cj ; j = 1, 2, , ......, c} proportions for category j for control, j=1,2,...,J i P γci = πcj γti = j=1 i P πtj j=1 Effect Size ψ = ln(γci ) − ln(1 − γci ) − (ln(γti ) − ln(1 − γti )) H0 : ψ = 0 Vs H1 : ψ 6= 0 or H1 : ψ > 0 mi = multinomial samples i=c, t xij = number of these mi = observations that fall into the jth ordered category. xcj + xcj = nj ; mc + mt = N Xt = (xt1 , xt2 , ...., xtC ) Xc = (xc1 , xc2 , ...., xcC )n = (n1 , n2 , ...., nC ); Test Statistic: Wilcoxon Rank Sum T = C X w j xj 1 Asymptotic approximation for the exact conditional power is given by: ! tα (n) − E(T |n, H1 ) p β(n) = 1 − Φ var(T |n, H1 ) Where tα (n) = E(T |n, H0 ) − Zα p V ar(T |n, H0 ) For more details, the user is referred to Rabee et al (2003) Q.9.4 Many Samples: Multi-arm: Trend in R ordered proportions Case 1: User based Probabilities ri = ith Population Fraction wi = ith Population Score πi = P ith Proportion response ri wi w= Q.9 Discrete:Many Samples – Q.9.4 Multi-arm: Trend Test 2527 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs P πi ri (wi −w) δ= Ni = N ∗ ri Population size for the ith group Ni =P π = Pri πi N= Ni P 2 Var(Pooled) = N π(1 − π) ri (wi −w) P 2 Var(Unpooled) = N πi (1 − πi )ri (wi −w) One sided " P ower = 1 − Φ (Zα − p s N ∗δ var(P ooled) ) var(P ooled) var(U nP ooled) # Two sided " P ower = 1 − Φ Zα/2 − p " Φ !s N ∗δ var(P ooled) N ∗δ −Z α/2 − p var(P ooled) !s # var(P ooled) + var(U nP ooled) var(P ooled) var(U nP ooled) # Case 2: Model based probabilities In this case, our first aim is to compute the vector of proportion responses i.e., πi and then apply the methods described above. We have log of common odds ratio (K) = πi (1−πi−1 )/ (π i−1 (1−πi )) Wi − Wi−1 πi πi−1 = eK(Wi − Wi−1 ) π (1 − i ) (1 − πi−1 ) πi = 1 πi−1 K(Wi − Wi−1 ) (1−πi−1 ) e πi−1 + (1−π eK(Wi − Wi−1 ) i−1 ) Determine all πi ’s and then apply the steps mentioned in Case 1 to compute Power. Sample size computation is by iterating on the power function. Q.9.5 2528 Many Samples: Multi-arm: Chi-square for Rx2 proportions Q.9 Discrete:Many Samples – Q.9.5 Multi-arm:Chi-square for Rx2 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 R= number of groups nt = sample size for the ith arm ni n1 P r π π0 = P i i ri P ri (π i − π0 )2 P V = ri ri = Effect size: 2 ∆ = Where Q1 = nt N = nt nt + nc V π0 (1 − π 0 ) Noncentrality parameter λ λ = N ∆2 Compute Power Find x2R−1,α such that P (χ2R−1 > χ2R−1,α ) = α from central Chisquare with c-1 degrees of freedom. P ower = P λ (χ2 > χ2R−1,α ) where χ2 is a non-central chi square variable with R-1 d.f. and non-centrality parameter λ. Compute Sample Size For given power, N is determined using iterative method. Q.9.6 Many Samples: Multi-arm: Chi-square for proportions in RxC tables R= number of groups (arms) C= number of categories ni = sample size for the ith arm ri = nn1i πij = proportion of subjects belonging to the ith group and j th category. i=1,2,..., R, j=1,2,....,C πj = proportion in the j th category. Effect size: R P ∆2 = i=1 ri C P j=1 P (πij −πj )2 πj ri Q.9 Discrete:Many Samples – Q.9.6 Multi-arm:Chi-square:RxC 2529 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Noncentrality parameter: λ = N ∆2 Compute Power Find χ2R−1,α such that P (χ2(R−1)(C−1) > χ2(R−1)(C−1),α ) = α from central Chisquare with (R - 1)(C - 1) degrees of freedom. P ower = Pλ (χ2 > χ2(R−1)(C−1),α ) where χ2 is a non-central chi square variable with (R - 1)(C - 1) degrees of freedom and non-centrality parameter λ. Compute Sample Size For given power, N is determined using iterative method. Q.10 Sample Size :Discrete : Regression Q.10.1 Logistic Regression: Odds Ratio Q.10.1 Logistic Regression: Odds Ratio One Covariate P0 = Proportion successes of events at the mean value of the covariate, µ P1 = Proportion successes of events at the mean value of the covariate, µ + σ 1 (1−P 0 ) θ = Odds ratio = P P0 (1−P ) 1 One sided – Compute Power " P ower = Φ e η2 4 s N P0 η 2 − Zα [1 + 2P0 δ] where 2 δ= 1 + (1 + η )e 1+e 2 − η4 5η 2 4 !# η = ln(θ) and Zα = Φ−1 (1 − α) – Compute Sample Size N= 2530 [Zα + Zβ e P0 η 2 −η 2 4 2 ] [1 + 2P0 δ] Q.10 Logistic Regression:Odds Ratio – Q.10.1 Logistic Regression: Odds Ratio <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 where 1 + (1 + η 2 ) e δ= 1+e −η 2 4 5η 2 4 η = ln(θ) Two Sided – Compute Power " P ower = Φ e η2 4 s where δ= 1 + (1 + N P0 η 2 − Zα/2 [1 + 2P0 δ] !# 2 5η 2 e 4 η ) 2 1+e −η 4 η = ln(θ) and Zα = Φ−1 (1 − α) . – Compute Sample Size Zα/2 +Zβ e N= −η 2 4 2 [1 + 2P0 δ] P0 η 2 where δ= 1 + (1 + 2 5η 2 e 4 η ) 2 1+e −η 4 η = ln(OR) = ln(θ) More Than One Covariate P0 = Proportion successes of events at the mean value of the covariate, µ P1 = Proportion successes of events at the mean value of the covariate, µ + σ 1 (1−P 0 ) θ = odds ratio = P , ρ2 = the square of multiple correlation coefficient (ρ) P0 (1−P 1 ) (between X1 and other remaining covariate.) One sided – Compute Power " P ower = Φ e η2 4 s N P0 η 2 (1 − ρ2 ) − Zα [1 + 2P0 δ] !# Q.10 Logistic Regression:Odds Ratio – Q.10.1 Logistic Regression: Odds Ratio 2531 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs where δ= 1 + (1 + η 2 )e 1+e 2 − η4 5η 2 4 η = ln(θ) and Zα = Φ−1 (1 − α) – Compute Sample Size N= N1 (1 − ρ2 ) where Zα + Zβ e N1 = 2 − η4 2 [1 + 2P0 δ] P0 η 2 and δ= 1 + (1 + η 2 )e 1+e 2 − η4 5η 2 4 η = ln(θ) Two Sided – Compute Power " P ower = Φ e η2 4 s N P0 η 2 (1 − ρ2 ) − Zα/2 [1 + 2P0 δ] where δ= 1 + (1 + η 2 )e 1+e 2 − η4 5η 2 4 !# η = ln(θ) – Compute Sample Size N= where N1 (1 − ρ2 ) 2 2 −η Zα/2 + Zβ e 4 N1 = [1 + 2P0 δ] P0 η 2 and 2 δ= 1 + (1 + η )e 1+e 2532 2 − η4 5η 2 4 Q.10 Logistic Regression:Odds Ratio – Q.10.1 Logistic Regression: Odds Ratio <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Q.11 Sample Size : Agreement Q.11.1 Cohen’s Kappa: Two Binary Ratings Q.11.2 Cohen’s Kappa:Two Categorical Ratings Q.11.1 Cohen’s Kappa: Two Binary Ratings πij = Proportion of population given rating i by Rater 1 and j by Rater 2. K0 = Kappa under Null, K1 = Kappa under Alternative One Sided √ P ower = Φ where Q1 = X N1 (|K1 − K0 |) − Zα p Q1 p Q0 ! πii [(1 − πc ) − (πi. + π.i )(1 + π0 )]2 i – Compute Sample Size " N= Zα p Q0 + Zβ K1 − K0 p Q1 #2 Two Sided √ P ower = Φ Where Q1 = X N1 (|K1 − K0 |) − Zα/2 p Q1 p Q0 ! πii [(1 − πc ) − (πi. + π.i )(1 + π0 )]2 i – Compute Sample Size " N= Q.11.2 Zα p Q0 + Zβ K1 − K0 p Q1 #2 Agreement: Cohen’s Kappa: Two Categorical Ratings C = Number of ratings π0 = Proportion of agreement Q.11 Cohen’s Kappa:Two Binary Ratings – Q.11.2 Cohen’s Kappa:Two Categorical Ratings2533 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs πe = Expected proportion of agreement πij = Proportion of population given rating i by Rater 1 and j by Rater 2. K0 = Kappa under Null K1 = Kappa under Alternative Compute Power √ P ower = Φ N (K1 − K0 ) − Z1−α max τ (k̂|k = 0.4) ! max τ (k̂|k = 0.6) where √ b = τ (k) Q1 + Q2 − 2Q3 − Q4 2 (1 − πe ) 2 Q1 = π0 (1 − πe ) 2 Q2 = (1 − π0 ) XX i πij (πi. + π.j )2 j Q3 = 2(1 − π0 )(1 − πe ) X πij (πi. + π.j ) i Q4 = (π 0 πe −2 πe + π0 )2 Compute Sample Size N≥ Z1−α max τ (k̂|k = k0 ) + Z1−β max τ (k̂|k = k1 ) k1 − k0 !2 Ref : Flack, V.F., et. Al. (1988). 2534 Q.11 Cohen’s Kappa:Two Binary Ratings – Q.12.2 Cohen’s Kappa:Two Categorical Ratings <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Q.12 Sample Size : Count Data Q.12.1 One Sample: Single Poisson rate Q.12.2 Two Samples:Ratio of Poisson Rates Q.12.3 Ratio of Negative Binomial Rates Q.12.1 One Sample: Single Poisson rate X : No. of events (outcomes) observed during an interval of specified length. D = Exposure Duration (This could be time, length, volume, area etc) X ∼ Poisson (λD) λ = Poisson rate ( mean number of occurrences of X during a unit length interval) λ0 = Hypothesized value of λ λ1 = Value of λ at which Power is to be computed. n = sample size = Number of times observations on X taken over the Exposure duration D G(.,k) denote the CDF of chi square distribution with k d.f. One sided test (right tailed) H0 : λ = λ0 Vs H1 : λ = λ0 – Compute Power 1. Find ’k’ such that G(2nD λ0 ; 2k) ≤ α (Q.1) 2. Compute P ower = 1 − F (k − 1, nD λ1 ) = G(2nD λ1 ; 2k) (Q.2) where k is obtained from equation Q.1. – Compute sample size Solve equation (Q.1) and equation (Q.2) simultaneously for n and k. One sided test (left tailed) H0 : λ = λ0 VS H1 : λ < λ0 – Compute Power 1. Find ’k’ such that G(2nD λ0 ; 2(k + 1)) ≥ 1 − α (Q.3) P ower = 1 − G(2nD λ1 ; 2(k + 1)) (Q.4) 2. Compute where k is obtained from equation Q.12 One Sample : Single Poisson rate – Q.12.1 One Sample: Single Poisson rate2535 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs – Compute sample size Solve equation (Q.3) and equation (Q.4) simultaneously for n and k. Two sided test H0 : λ = λ0 V s H1 : λ 6= λ0 For carrying out a two sided design,(compute power and sample size and duration) compute α α0 = 2 Execute the algorithm for one sided (right or left depending upon the sign of the difference λ1 − λ0 ) with α0 as the value of level of significance, α. Q.12.2 Two Samples: Ratio of Poisson Rates λc : Poisson rate for control arm λt : Poisson rate for treatment arm Dt : Duration of study for the treatment arm Dc : Duration of study for the control arm Xt : No of events (outcomes) observed on Treatment arm in time Dt Xt ∼ Poission (λt Dt ) Xc : No of events (outcomes) observed on Control arm in time Dc Xc ∼ Poission (λc Dc ) nt : Number of observations on Treatment arm nc : Number of observations on Control arm r = nnct allocation ratio c nc d= D Dt nt ρ0 = Hypothecated value of the ratio, λt λc ρ1 = value of the ratio at which the power is to be computed One sided test (right tailed) H0 : λλct > ρ0 ≥ 1 Vs H1 : λλct = ρ0 H1 : λλct = ρ1 where ρ1 > ρ0 Test Statistic ρ Xt ln( X ) − ln( d0 ) c q W3 = 1 + X1c Xt In case Xt or Xc = 0, the value is set to 0.5 for that variable. 2536 Q.12 One Sample : Single Poisson rate – Q.12.2 Two Samples:Ratio of Poisson Rates <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Compute Power P ower = 1 − Φ(Z1−α − µ ) σ (Q.5) ρ 1 where µ = ln( ρ01 ) and σ 2 = Dcdn+ρ c λc ρ1 Compute Sample Size Solve equation (Q.5) for nc by using the following algorithm. 1. Compute ρ " ln( ρ01 ) 2 σ = #2 Z1−α − Φ−1 (1 − power) 2. Compute nc = d + ρ1 Dc σ 2 λc ρ1 3. Compute nt = r ∗ nc n = nt + nc One sided test (left tailed) H0 : λλct = ρ0 ≥ 1 Vs H1 : H1 : λλct = ρ1 Where ρ1 < ρ0 λt λc < ρ0 P ower = Φ(Zα − ρ Where µ = ln( ρ01 ) and σ 2 = 0.5 for that variable. d +ρ1 Dc nc λc ρ1 µ ) σ (Q.6) In case Xt or Xc = 0, the value is set to Compute Sample Size Solve equation (Q.6) for nc by using the following algorithm. 1. Compute 2 ρ ln( ρ1 ) 0 σ 2 = Zα − Φ−1 (1−power) 2. Compute nc = d + ρ1 Dc σ 2 λc ρ1 Q.12 One Sample : Single Poisson rate – Q.12.2 Two Samples:Ratio of Poisson Rates2537 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs 3. Compute nt = r ∗ nc n = nt + nc Two Sided Test H0 : λλct = ρ0 ≥ 1 Vs H1 : λλct 6= ρ0 Depending upon the ratio of rates > 1 or < 1, use the power computation formula for ρ1 > ρ0 or ρ1 < ρ0 as the case may be with α replaced by α2 . Q.12.3 Two Samples: Ratio of Negative Binomial Rates Xc ∼ N B(λc , Υc ) Xt ∼ N B(λt = θλc , Υt ) θ = = nt/nc λt λc u = Fixed follow up k = Allocation Ratio One sided test (Left tailed) Ho : θ = 1 Vs H1 : θ < 1 – Compute power P ower = Φ(Eθ − zα ) b √ θ) Where Eθ = Test statistic = − nc q 1+γc λcln( 1+γt λc θu u λc µ "s 1 A= 2 [ln(θ̂)] + k λc θu 1 + γc λc u 1 + γt λc θu + λc µ kλc θu # – Compute Sample Size n = A(zα + zβ )2 (1 + k) One sided test (Right tailed) Ho : θ = 1 Vs H1 : θ > 1 – Compute Power P ower = Φ(Eθ − (−1 ∗ zα )) 2538 Q.12 One Sample : Single Poisson rate – Q.12.3 Ratio of Negative Binomial Rates <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 – Compute Sample Size n = A((−1 ∗ zα ) + zβ )2 (1 + k) Two sided test Ho : θ = 1 Vs. H1 : θ 6= 1 – Compute Power P ower = 1 − Φ(Eθ − (−1 ∗ zα/2 )) + Φ(Eθ − zα/2 ) – Compute Sample Size n = A(zα/2 +zβ )2 (1 + k) Q.13 Sample Size :Time to Event Data Q.13.1 Two Samples: Superiority: Logrank Effect Size: δ = ln λλct where λt and λc are hazard rates for treatment and control arms respectively. In Time to event studies, maximum number of events are determined for given power. H0: δ = 0 Vs H1: δ = δ1 Test Statistic (Log Rank) Suppose at the end of study, in all q failures are observed with failure times τ1 , τ2 , ...., τi ,.... τq . Accordingly, there will be q 2x2 tables of the following type. The ith table is shown below: Where the subscripts t and c indicate values observed under treatment and control. S= q X i=1 {dt (τi ) − nt (τi ) dt (τi ) } n(τi ) Q.13 Sample Size: Time to Event Data – Q.13.1 Sup: Logrank 2539 <<< Contents * Index >>> Q Technical Reference and Formulas: Single Look Designs Status Failed Not Failed Total Treatment T dt (τi ) nt (τi ) − dt (τi ) nt (τi ) Treatment C dc (τi ) nc (τi ) − dc (τi ) nc (τi ) Total d(τi ) n(τi ) − d(τi ) n(τi ) S ∼ AN (M ean = δ Dmax r(1 − r), V ariance = r(1 − r) Dmax ) Where r= proportion randomized to treatment T . One sided test (Variance under Null) 2 Dmax = (Zα + Zβ ) δ12 r(1 − r) One sided test (Variance under Alternative) 2 Dmax = (Zα + Zβ ) δ12 p(1 − p) where p=proportion of Dmax estimated to be on the experimental arm under the alternative hypothesis. East uses an iterative procedure to estimate p. Two sided test (Variance under Null) 2 Dmax = (Zα/2 + Zβ ) δ12 r(1 − r) Two sided test (Variance under Alternative) 2 Dmax = (Zα/2 + Zβ ) δ12 p(1 − p) where p=proportion of Dmax estimated to be on the experimental arm under the alternative hypothesis. East uses an iterative procedure to estimate p. Q.13.2 Two Samples: Noninferiority: Logrank Effect Size: δ = ln λλct where λt and λc are hazard rates for treatment and control arms respectively. In Time to event studies, maximum number of events are determined for given power. 2540 Q.13 Sample Size: Time to Event Data – Q.13.2 Noninf : Logrank <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 H0: δ > δ0 Vs H1: δ < δ0 Test Statistic (Log Rank) Suppose at the end of study, in all q failures are observed with failure times τ1 , τ2 , ...., τi ,.... τq . Accordingly, there will be q 2x2 tables of the following type. The ith table is shown below: Status Failed Not Failed Total Treatment T dt (τi ) nt (τi ) − dt (τi ) nt (τi ) Treatment C dc (τi ) nc (τi ) − dc (τi ) nc (τi ) Total d(τi ) n(τi ) − d(τi ) n(τi ) Where the subscripts t and c indicate values observed under treatment and control. S= q X {dt (τi ) − i=1 nt (τi ) dt (τi ) } − δ0 n(τi ) S ∼ AN (M ean = δ Dmax r(1 − r) − δ0 , V ariance = r(1 − r) Dmax ) Where r= proportion randomized to treatment T δ0 = Noninferiority margin. One sided test (Variance under Null and Alternative both) 2 Dmax = Q.13 Sample Size: Time to Event Data (Zα + Zβ ) 2 (δ1 − δ0 ) r(1 − r) 2541 <<< Contents * Index >>> R Technical Reference and Formulas: Analysis In this Appendix, we provide the theory used in East 6.4 for analyzing data under the Analysis menu. Note: The test statistics formulas provided in this Appendix can be used in interim analysis of data while monitoring a group sequential study or for analyzing data arising out of a single sample study. For common notations and references the user is referred to the Technical Reference and Formulas:Single Look Designs or the respective chapters of the tests. R.1 Basic StatisticsDescriptive Statistics R.1.1 R.1.2 R.1.3 R.1.4 Central Tendency Dispersion Distribution Summary R.1.1 Central Tendency Mean If Xi , i = 1, 2, . . . , n are n observations, then the mean X̄ is defined as n X̄ = 1X Xi n i=1 (R.1) Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then the mean X̄ is defined as n 1 X X̄ = P Xi fi (R.2) n i=1 fi i=1 Median: Median is the value of the middle most observation, when the observations are arranged in ascending or descending order. If the number of observations is even, then the median is defined as the mean of the middle most two observations. Mode: Mode is the value of Xi with the maximum frequency fi . If there are more than one Xi with maximum frequency, then the smallest of all such Xi ’s will be used as the value of mode. Geometric Mean: If Xi , i = 1, 2, . . . , n are n observations, then the geometric mean GM is defined as " n # n1 Y GM = Xi (R.3) i=1 2542 R.1 Basic Statistics- Descriptive Statistics – R.1.1 Central Tendency <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then the geometric mean GM is defined as " n # P1f i Y f i Xi GM = (R.4) i=1 Harmonic Mean If Xi , i = 1, 2, . . . , n are n observations, then the harmonic mean HM is defined as n HM = P (R.5) n i=1 1 Xi Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then the harmonic mean HM is defined as n P fi i=1 (R.6) HM = P n i=1 R.1.2 fi Xi Dispersion Standard Deviation If Xi , i = 1, 2, . . . , n are n observations, then the standard deviation is defined as " #0.5 n 1 X s= (Xi − X̄)2 (R.7) n − 1 i=1 Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then the standard deviation is defined as 0.5 n X 1 s= (Xi − X̄)2 fi n P fi − 1 i=1 (R.8) i=1 Standard Error of Mean If Xi , i = 1, 2, . . . , n are n observations and s is the standard deviation, then the standard error of mean is defined as s SE = √ n R.1 Basic Statistics- Descriptive Statistics – R.1.2 Dispersion (R.9) 2543 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then the standard error of mean is defined as s SE = s (R.10) n P fi i=1 Variance Variance is defined as the square of the standard deviation and is denoted as s2 . Coefficient of variation If x̄ and s are the mean and standard deviation respectively, then Coefficient of Variation is defined as follows: CV = X̄ s (R.11) Minimum is the minimum value of Xi , i = 1, 2, . . . , n. Maximum is the maximum value of Xi , i = 1, 2, . . . , n. Range is calculated as the difference: Maximum-Minimum. R.1.3 Distribution Skewness If Xi , i = 1, 2, . . . , n are n observations, then a measure of skewness is defined as n P 1 (Xi − X̄)3 n i=1 skewness = (R.12) (3/2) (n−1) 2 s n Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then a measure of skewness is defined as n P 1 (Xi − X̄)3 fi n P fi i=1 skewness = i=1 (n−1) 2 n s (3/2) (R.13) For normal distribution, skewness is zero and for any symmetric data, the value of skewness should be zero or close to zero. A negative value of skewness indicates that the data are skewed to the left or the left tail is heavier than the right tail. A positive value of skewness can be interpreted in a similar way. Kurtosis If Xi , i = 1, 2, . . . , n are n observations, then a measure of kurtosis is defined as n P 1 (Xi − X̄)4 n i=1 (R.14) Kurtosis = 2 − 3 (n−1) 2 s n 2544 R.1 Basic Statistics- Descriptive Statistics – R.1.3 Distribution <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then a measure of kurtosis is defined as 1 n P Kurtosis = n P (Xi − X̄)4 fi fi i=1 i=1 (n−1) 2 n s 2 −3 (R.15) The standard normal distribution has a kurtosis of 3. A kurtosis value > 3 indicates a relatively peaked distribution and a value < 3 indicates relatively flat distribution of the data. R.1.4 Summary Sum If Xi , i = 1, 2, . . . , n are n observations, then sum is defined as Sum = n X Xi (R.16) i=1 Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then sum is defined as Sum = n X Xi fi (R.17) i=1 Count If Xi , i = 1, 2, . . . , n are n observations, then Count is defined as Count = n (R.18) Further if fi is the frequency or weight for Xi , i = 1, 2, . . . , n, then Count is defined as Count = n X fi (R.19) i=1 R.1 Basic Statistics- Descriptive Statistics – R.1.4 Summary 2545 <<< Contents R R.2 * Index >>> Technical Reference and Formulas: Analysis Basic StatisticsAnalytics R.2.1 R.2.2 R.2.3 R.2.4 Independent t-test Paired t-test Analysis of Variance Spearman’s RankOrder Correlation R.2.5 Multiple Linear Regression R.2.6 Collinearity Diagnostics R.2.7 Multivariate Analysis of Variance R.2.1 Independent t-test Equal variance If x1 , x2 , . . . , xnx is a random sample from a normal population with mean µx and standard deviation σx and y1 , y2 , . . . , yny is a random sample from a normal population with mean µy and standard deviation σy , we want to test null hypothesis: H0: µx = µy under the assumption σx = σy The test statistic is: x̄ − ȳ t= q s n1x + where (R.20) 1 ny n x̄ = n 1X xi , n i=1 ȳ = and 1X yi n i=1 and s is the pooled standard deviation. s (nx − 1)s2x + (ny − 1)s2y s= nx + ny − 2 (R.21) (R.22) The above statistic is distributed as t with (nx + ny − 2) degrees of freedom. Unequal variance If x1 , x2 , . . . , xnx is a random sample from a normal population with mean µx and standard deviation σx and y1 , y2 , . . . , yny is a random sample from a normal population with mean µy and standard deviation σy , we want to test null hypothesis: H0: µx = µy under the assumption σx 6= σy The testing procedure uses the approximation described by Scheffe (1970) as follows:. t= r δ̂ Sy2 ny ∼ S2 + nxx tν (R.23) where ν is the degrees of freedom given by ν= 2 Sy ny 2 Sx nx 2 Sy2 Sx ny + nx 2 2 ny −1 2546 + 2 nx −1 R.2 Basic Statistics-Analytics – R.2.2 Paired t-test (R.24) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 R.2.2 Paired t-test If (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) are n paired observations, we would like to test the hypothesis that the differences d1 = x1 − y1 , d2 = x2 − y2 , . . . , dn = xn − yn come from a normal distribution with mean 0. If µ is the population mean of the differences, then we want to test null hypothesis: H0: µ = 0. The test statistic is t= d¯ √ s/ n where (R.25) n 1X d¯ = di n i=1 v u n u 1 X ¯2 (di − d) s=t n − 1 i=1 (R.26) (R.27) This statistic is distributed as t with degrees of freedom (n-1). R.2.3 Analysis of Variance One-way Analysis of Variance: Suppose n subjects have been allocated randomly to r treatments and measurements have been made on a variate x for all the subjects, with the resulting data being denoted as follows: Treatment 1: x11 , x12 , . . . , x1n1 Treatment 2: x21 , x22 , . . . , x2n2 .. . Treatment r: xr1 , xr2 , . . . , xrnr We assume that the data of the r treatment groups come from r normally distributed populations with the same variance σ 2 and with means µ1 , µ2, , . . . , µr . We want to test the hypothesis that these means are equal: H0 :µ1 = µ2 = · · · = µr (R.28) The sum of squares is S= ni r X X (xik − x̄)2 , (R.29) i=1 k=1 R.2 Basic Statistics-Analytics – R.2.3 Analysis of Variance 2547 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis where n r x̄ = r i 1 XX 1X xik = ni x̄i , n i=1 n i=1 (R.30) 1 (xi1 + xi2 + · · · + xini ) ni (R.31) k=1 x̄i = and n= r X ni . (R.32) i=1 We decompose the “sum of squares” S into two parts S1 and S2 , S = S1 + S2 where S1 = r X (R.33) ni (x̄i − x̄)2 (R.34) i=1 S2 = ni r X X (xik − x̄i )2 . (R.35) i=1 k=1 S1 refers to the variation between the treatments and S2 the variation within treatments. The ratio, S1 /(r − 1) F = S2 /(n − r) (R.36) follows F distribution with (r-1, n-r) degrees of freedom. All these computations can be displayed in the usual ANOVA table as shown below: ANOVA Table Two-way Analysis of Variance: In a two-way experimental design, Source of Variation Between groups Residuals Total Sum of Squares S1 S2 S Degrees of Freedom r-1 n-r n-1 Mean Square M1 = S1 /(r-1) M2 = S2 /(n-r) F P-Value M1 /M2 there are two factors: A and B, with A having levels A1 , A2 , . . . , Aa and B having levels B1 , B2 , . . . , Bb . Suppose there are c observations for each combination of the factor levels Ai and Bj , then the data from such a study can be represented as: (xijk , i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , c) where the subscript i refers to level Ai of factor A, j refers to level Bj of factor B and k refers to k th observation for the 2548 R.2 Basic Statistics-Analytics – R.2.3 Analysis of Variance <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 combination of Ai and Bj . We assume that the number of replications for each combination of i and j is equal to c. We assume that n = abc observations xijk correspond to n random variables which are independent and are distributed normally with the same variance σ 2 . We want to test the hypotheses that: the means of A at all the a levels are same the means of B at all the b levels are same For carrying out these tests, we proceed as follows: We decompose the total ”sum of squares” a X b X c X S= (xijk − x̄)2 (R.37) i=1 j=1 k=1 into three parts S1 , S2 , and S3 , S = S1 + S2 + S3 (R.38) where S1 = bc a X (xi.. − x̄)2 (R.39) i=1 refers to the sum of squares due to the variation between the levels of A, S2 = ac b X (x.j. − x̄)2 (R.40) j=1 refers to the sum of squares due to the variation between the levels of B, and S3 = a X b X c X (xijk − x̄i.. − x̄.j. − x̄)2 (R.41) i=1 j=1 k=1 refers to the sum of squares due to the residual variation. Under the null hypothesis, 1 1 1 the quantities (a−1) S1 , (b−1) S2 and (n−a−b+1) S3 have χ2 distribution with (a − 1), (b − 1) and (abc − a − b + 1) degrees of freedom respectively. From this it follows that the quantity S1 /(a − 1) f1 = (R.42) S3 /(abc − a − b + 1) R.2 Basic Statistics-Analytics – R.2.3 Analysis of Variance 2549 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis follows F-distribution with (a − 1, n − a − b + 1) degrees of freedom and the quantity f2 = S2 /(b − 1) S3 /(abc − a − b + 1) (R.43) follows F-distribution with (b − 1, n − a − b + 1) degrees of freedom. All these computations are displayed in the usual ANOVA table as shown below: ANOVA Table Source of Variation Factor A Factor B Residuals Total R.2.4 Sum of Squares S1 S2 S3 S Degrees of Freedom a-1 b-1 abc-a-b+1 n-1 Mean Square M1 = S1 /(a-1) M2 = S1 /(b-1) M3 = S3 /(abc-a-b+1) F P-Value M1 /M3 M2 /M3 Correlations Pearson’s Product-Moment Correlation Coefficient Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be the n paired observations of two continuous random variables x and y. The Pearson product-moment correlation coefficient is a measure of association for these two variables. The formula for the Pearson product-moment correlation coefficient is n P rxy = s (xi − x̄)(yi − ȳ) s n n P P (xi − x̄)2 (yi − ȳ)2 i=1 i=1 where i=1 n x̄ = (R.44) n 1X xi , n i=1 ȳ = and 1X yi n i=1 (R.45) If fi is the frequency or weight for the ith paired observation (xi , yi ), then the formula for Pearson product-moment correlation coefficient can be written as: n P rxy = s (xi − x̄)(yi − ȳ)fi s n n P P (xi − x̄)2 fi (yi − ȳ)2 fi i=1 2550 i=1 i=1 R.2 Basic Statistics-Analytics – R.2.4 Correlations (R.46) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 where 1 x̄ = P n n X fi xi fi , and i=1 1 ȳ = P n i=1 n X fi yi fi . (R.47) i=1 i=1 Spearman’s Rank-Order Correlation Coefficient If we are reluctant to make the assumption of bivariate normality, we may use Spearman’s rank-order correlation coefficient instead of Pearson’s product-moment correlation coefficient. The only difference between the two measures of association is that Pearson’s measure uses the raw data whereas Spearman’s uses ranks derived from the raw data. Spearman’s rank-order correlation coefficient can be computed by substituting the ranks of xi and ranks of yi in the formulas for Pearson product-moment correlation coefficient. If ties are present in the raw data, the average ranks are used. Kendall’s Tau Kendall’s Tau is an alternative to Pearson’s product-moment correlation coefficient and Spearman’s rank-order correlation coefficient for ordinal data. The main distinction between this measure and Pearson’s or Spearman’s measures is that we can compute Kendall’s Tau without specifying numerical values. The actual values are needed only to order the variables, hence, different values that preserve the order will output same values of Kendall’s taus. All that is needed is an implicit ordering of the data. Kendall’s Tau is a nonparametric measure of association. It is based on the number of concordances and discordances in paired observations. When paired observations vary together, it denotes concordance and when they vary differently, it indicates discordance. The formula for Kendall’s Tau can be written as, P sgn(xi − xj )sgn(yi − yj ) i >> Technical Reference and Formulas: Analysis R.2.5 Multiple Linear Regression The regression procedures are performed using a variance-covariance updating procedure described in Maindonald, J. H. (1984). The least squared solution is facilitated by using Cholesky decomposition. Model Y = β0 + β1 X1 + β2 X1 + . . . βk Xk + ε where Y is the dependent variable (response) and X1 , . . . , Xk are the independent variables (predictors) and ε is a random error with a normal distribution having mean=0 and variance=σ 2 . The multiple linear regression algorithm computes the estimates β̂0 , β̂1 , . . . β̂k , of the regression coefficients β0 , β1 , . . . , βk , so as to minimize the sum of squares of residuals. R.2.6 Collinearity Diagnostics You can obtain Collinearity Diagnostics along the lines of Belsey, Kuh, and Welsh (1980), as a part of regression output. Under Collinearity Diagnostics the columns represent the variance components (related to principal components in multivariate analysis) and the rows represent the variance proportion decomposition explained by each variable in the model. The eigenvalues are those associated with the singular value decomposition of the covariance matrix of the coefficients (in fact the eigenvalues are the squares of the singular values) and the condition numbers are the ratios of the square root of the largest eigenvalue to all the rest. Since two or more variables are required to establish a dependency, it follows that two or more regression coefficient variances will be adversely affected by high variance decomposition proportions associated with a particular eigenvalue. It can be shown that only one high variance proportion in a given column cannot be indicative of a multicollinearity problem since the variance decomposition matrix of an orthogonal matrix (the ideal case indicating total independence) consists of only 0’s and 1’s. Thus, the broad rule for assessing collinearity is that there is an eigenvalue associated with a high condition index ( > 30, say) and with very high variance decomposition proportion ( > 0.5, say) for two or more regression coefficient variances. Interpretations are less obvious when there are competing dependencies (two or more near dependencies with the same condition index values) or two or more near dependencies with one condition index greatly dominating the others. The general principle suggested by Belsley, Kuh and Welsh is that near dependencies or collinearity, problems exist if the condition index exceeds some threshold, variously quoted as 10, 15 or 30. It is suggested that a condition index greater than 30 indicates moderate to severe collinearity. 2552 R.2 Basic Statistics-Analytics – R.2.6 Collinearity Diagnostics <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Parameters for Collinearity Diagnostics The parameters you need to provide for these diagnostics are described below: Number of collinearity components: Enter the number of collinearity components. This number can be between 2 and the number of degrees of freedom for the model. When the model is fitted without an intercept, the model degrees of freedom is equal to the number of predictors in the model. When the model is fitted with an intercept, the model degrees of freedom is equal to the number of predictors in the model plus one. Multicollinearity Criterion: The default value is 0.05. It controls how small the determinant of the matrix inverted to compute the coefficient estimates, is allowed to be. If a finer tolerance is required, decrease this value, this achieves a coarser tolerance can be achieved. This value must be between 0 and 1. Residuals You can obtain the results of various types of residuals which are described in this section. Unstandardized Residuals: These are computed by the formula Unstandardized residual = Actual response - Predicted response. Standardized Residuals: These consist of residuals divided by their standard deviation. They have the drawback that they do not have a common standard deviation. Studentized Residuals: These are computed by dividing the unstandardized residuals by quantities related to the diagonal elements of the hat matrix, using a common scale estimate computed without the ith case in the model. (Cook and Weisberg refer to this as external studentization). These residuals have t - distributions with (n-k-1) degrees of freedom, so any residual with absolute value exceeding 3, usually requires attention. (n is the number of cases). Deleted (predicted) Residuals: The deleted residual for the ith observation is obtained by fitting the model with the ith observation omitted, using the model to predict the ith observation and then computing the difference from the actual ith observation. The sum of squares of these deleted residuals is referred to as the Predicted Residual Error Sum of Squares (PRESS) statistic and is often used to select from competing regression models. The expression for PRESS is based on the studentized residuals (see Cook and Weisberg (1982)). Influence Statistics Cook’s Distance: Cook’s Distance is an overall measure of the impact of the ith datapoint on the estimated regression coefficient. In linear regression, Cook’s distance has, approximately, an F distribution with k and (n-k) degrees of freedom. A guide to the influence of the ith observation is given as follows: (see Bowerman, O’Connell, and Dickey (1986)). If Di is less than F(.8,k,n-k) (the upper 20th percentile of the F-distribution having k and n-k degrees of freedom), then the ith observation should not be considered influential. R.2 Basic Statistics-Analytics – R.2.6 Collinearity Diagnostics 2553 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis If Di is greater than F(.5,k,n-k) (the 50th percentile of the F-distribution having k and n-k degrees of freedom), then the ith observation should be considered influential. If F (.8, k, n − k) ≤ Di ≤ F (.5, k, n − k) then the nearer Di is to F(.5,k,n-k) the greater the extent of the influence of the ith observation. DFFIT’s (change in the regression fit): These reflect coefficient changes as well as forecasting effects when an observation is deleted and are similar to Cook’s distance. Covariance Ratios: This measure reflects the change in the covariance matrix of the estimated coefficients when the ith observation is omitted. The suggestion is that |covarianceratio − 1| ≥ 3p/n warrants further investigation. Diagonal of the hat matrix: This measure is also known as the leverage of the ith observation. The diagonal elements sum to the number of parameters being fitted. Any value greater than 2*p/n suggests further investigation. R.2.7 Multivariate Analysis of Variance One-way MANOVA Suppose, n individuals have been subjected randomly to r treatments and measurements have been made on p variates with resulting data represented as follows: Treatment 1: X11 , X12 , . . . , X1n1 Treatment 2: X21 , X22 , . . . , X2n2 .. . Treatment r: Xr1 , Xr2 , . . . , Xrnr . where n1 + n2 + . . . + nr = n (we assume that there are at least two observations in each group). We note that each Xij is a p-dimensional column vector. Assume that each vector observation ∼ N (µ, Σ). We want to test the hypothesis that these mean vectors are equal i.e.: H0 : µ1 = µ2 . . . = µr We draw an analogy with univariate one way ANOVA. There we calculated various sums of squares, namely ‘between groups sum of squares’, ‘residuals sum of squares’, and ‘total sum of squares. Here too, we will compute similar entities. In the multivariate situation, instead of one value, we will have a p × p matrix of values. The values along the diagonal and the off diagonal elements will be sums of cross products (SSP). The formulas are given in the table below. Manova Table For Comparing Mean Vectors Of Populations 2554 R.2 Basic Statistics-Analytics – R.2.7 Multivariate Analysis of Variance <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Source of Variation Treatment Residual Matrix Sum of Squares and Cross Product r P B= ni (Xi − X)(Xi − X)0 W = i=1 ni r P P r−1 (Xij − Xi )(Xij − Xi )0 i=1 j=1 ni r P P (Xij − X)(Xij − X) B+W = Total Degrees of Freedom P ni − r P ni − 1 i=1 j=1 where X̄i = 1 ni ni P j=1 Xij and X̄ = 1 n ni r P P Xij The test is based on Wilk’s Λ which is i=1 j=1 given by W ilk 0 s Λ = |W | |B + W | Small values of Wilk’s Λ suggest rejection of the null hyothesis. A general result regarding approximate distribution of Λ is that −[n − 1 − (p + r)/2]lnΛ) follows a chi-square distribution with p(r − 1) d.f. For a detailed table regarding distribution of Wilk’s λ, (see Johnson and Wichern , 1998). Test for Parallel Profiles Here the hypothesis to be tested is weaker than the earlier hypothesis that asserted equality of mean vectors. Instead, now we ask if differences in successive co-ordinate wise means are the same in all populations. In other words our hypothesis is: H0 : The difference µij − µij−1 is the same for all groups i = 1, 2, . . . , r, and for all components j = 2, 3, . . . , p. This hypothesis in matrix form can be expressed as H0 : Cµ1 = Cµ2 = . . . = Cµr Where −1 0 C(p−1)×p = . .. 0 1 −1 .. . 0 0 0 ··· 1 0 ··· .. .. .. . . . 0 0 ··· 0 0 .. . 0 0 .. . −1 1 Clearly a test for this hypothesis is the same test as above after transforming the variables from Xp×1 to Y(p−1)×1 where Y = CX. R.2 Basic Statistics-Analytics – R.2.7 Multivariate Analysis of Variance 2555 <<< Contents R R.3 * Index >>> Technical Reference and Formulas: Analysis Continuous R.3.1 Single Arm: Single Mean R.3.2 Paired Design: Mean of Paired Differences R.3.3 Parallel Design: Difference of Means R.3.4 Wilcoxon Signed Rank Test R.3.5 Linear Regression R.3.1 Single Arm: Single Mean Normal Superiority Trials: One-Sample Test - Single Mean Hypothesis: H0 : µ = µ0 Test statistic µ̂ − µ0 , Z= q σ̂ 2 n where µ̂ is the sample mean and σ̂ 2 is the sample variance based on the n observations. References: 1. Jennison, C and Turnbull, BW (2000). 2. Sheskin, DJ (2004). R.3.2 Paired Design: Mean of Paired Differences Normal Superiority Trials: One-Sample Test - Mean of Paired Differences Hypothesis: H0 : δ = µt − µc = 0 Test statistic µ̂t − µ̂c Z= q 2 , σ̂d n where µ̂t and µ̂c are the sample means based on the n pairs of observations in the treatment and control arm, respectively, and σ̂d2 is the sample variance of the paired differences. Denote the observed differences by dl , for l = 1, . . . , n pairs of observations, then the sample variance is given by: n P σ̂d2 = l=1 ( d2l − n P dl )2 l=1 n−1 n . References: 1. Jennison, C and Turnbull, BW (2000). 2. Sheskin, DJ (2004). Normal Superiority Trials: One-Sample Test - T-test for Single Mean Hypothesis: H0 : µ = µ0 . 2556 R.3 Continuous – R.3.2 Paired Design: Mean of Paired Differences <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Test statistic: µ̂ − µ0 T = q , σ̂ 2 n where µ̂ is the sample mean and σ̂ 2 is the sample variance based on n observations. References: 1. Sheskin, DJ (2004). Normal Superiority Trials: One-Sample Test - T-test for Mean of Paired Differences Hypothesis: H0 : δ = µt − µc = 0 Test statistic: µ̂t − µ̂c T = q 2 , σ̂d n where µ̂t and µ̂c are the sample mean based on n pairs of observations in the treatment and control arm, respectively, and σ̂d2 is the sample variance of the paired differences. Denote the observed differences by dl , for l = 1, . . . , n pairs of observations, then the sample variance is given by: n P σ̂d2 = l=1 ( d2l − n P dl )2 l=1 n . n−1 References: 1. Sheskin, DJ (2004). R.3.3 Parallel Design: Difference of Means Normal Superiority Trials: Two-Sample Test - Difference in Means Hypothesis :H0 : δ = µt − µc = 0 Variance : Equal Test statistic µ̂t − µ̂c Z=r σ̂ 2 n1c + 1 nt , where µ̂t and µ̂c are the sample mean based on nt and nc observations, and σ̂ 2 is the pooled estimate of variance. R.3 Continuous – R.3.3 Parallel Design: Difference of Means 2557 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis References: 1. Cytel East 3 User Manual (2004). 2. Jennison, C and Turnbull, BW (2000). 3. Sheskin, DJ (2004). Normal Superiority Trials: Two-Sample Test - T-test for Difference of Independent Means Hypothesis : H0 : δ = µt − µc = 0 Variance : Equal Test statistic: µ̂t − µ̂c T =r σ̂ 2 n1c + 1 nt , where µ̂t and µ̂c are the sample mean based on nt and nc observations in the treatment and control arm, respectively, and σ̂ 2 is the pooled estimate of variance. References: 1. Sheskin, DJ (2004). Normal Non-Inferiority Trials: Two-Sample Test - Difference in Means Hypothesis : H0 : δ = µt − µc >= δ0 Test statistic : µ̂c − µ̂t − δ0 Z=r , 1 1 2 σ̂ nc + nt where µ̂t and µ̂c are the sample mean based on nt and nc observations in the treatment and control arm, respectively, and σ̂ 2 is the pooled estimate of variance. δ0 is the non-inferiority margin. References: 1. Cytel East 3 User Manual (2004). 2. Jennison, C and Turnbull, BW (2000). 3. Sheskin, DJ (2004). Normal Non-Inferiority Trials: Two-Sample Test - T-test for Difference of Independent Means Hypothesis : H0 : δ = µt − µc >= δ0 2558 R.3 Continuous – R.3.3 Parallel Design: Difference of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Test statistic: µ̂c − µ̂t − δ0 T =r , 1 1 2 σ̂ nc + nt where µ̂t and µ̂c are the sample mean based on nt and nc observations in the treatment and control arm, respectively, and σ̂ 2 is the pooled estimate of variance. δ0 is the non-inferiority margin. References: 1. Sheskin, DJ (2004). Normal Equivalence Trials: Two-Sample Test - Difference of Means Hypothesis : H0 : δ = µt − µc <= δL Or δ = µt − µc >= δU Test statistics: (This test is performed as two separate α-level one-sided hypothesis t-tests) TL = µ̂c − µ̂t − δL q , TU = µ̂c − µ̂t − δU q , and σ̂ 2 nr(1−r) σ̂ 2 nr(1−r) where µ̂t and µˆc are the sample mean in the treatment and control arm, respectively, and σ̂ 2 is the pooled estimate of common variance, all based on n observations. The assigned fraction r is the probability of being randomized to the treatment arm, and δL and δU are the lower and upper equivalence limits, respectively. Denote the sample variance in the treatment and control arm by σ̂t2 and σ̂c2 , respectively. The pooled estimate of common ariance is given by: σ̂ 2 = (nt − 1)σ̂t2 + (nc − 1)σ̂c2 . n−2 References: 1. Schuirmann, DJ (1987). 2. Diletti, E, Hauschke, D. and Steinijans, VW (1991). 3. Owen, DB (1965). Normal Equivalence Trials: Two-Sample Test - Log Ratio of Means Hypothesis : H0 : δ = ln(µt /µc ) <= δL Orδ = ln(µt /µc ) >= δU R.3 Continuous – R.3.3 Parallel Design: Difference of Means 2559 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Test statistics: (This test is performed as two separate α-level one-sided hypothesis t-tests) TL = ln(µ̂c ) − ln(µ̂t ) − δL r , 2 ˆ ) ln(1+CV nr(1−r) and TU = ln(µ̂c ) − ln(µ̂t ) − δU r , 2 ˆ ) ln(1+CV nj r(1−r) where µ̂t and µ̂c are the sample means in the treatment and control arm, ˆ is the pooled estimate of the coefficient of variation, all respectively, and CV based on n observations. The assigned fraction r is the probability of being randomized to the treatment arm, and δL and δU are the lower and upper equivalence limits, respectively. References: 1. Schuirmann, D.J. (1987). 2. Hauschke, D, Kieser, M, Diletti, E and Burke, M (1998). 3. Diletti, E, Hauschke, D and Steinijans, VW (1991). 4. Owen, D.B. (1965). Normal Equivalence Trials: Two-Sample Test - Difference of Means in Crossover Designs Hypothesis : H0 : δ = µt − µc <= δL Orδ = µt − µc >= δU To determine by a difference metric whether the unknown mean µt under treatment is equal to the unknown mean µc under control for n subjects enrolled in a 2 × 2 crossover trial. Test statistics: (This test is performed as two separate α-level one-sided hypothesis t-tests) TL = µ̂c − µ̂t − δL q , TU = µ̂c − µ̂t − δU q , and M SE nr(1−r) M SE nr(1−r) where µ̂t and µˆc are the sample mean in the treatment and control arm, respectively, and M SE is the mean squared error obtained by fitting a linear 2560 R.3 Continuous – R.3.3 Parallel Design: Difference of Means <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 model to the crossover data, all based on n observations. The assigned fraction r is the probability of being randomized to the treatment arm, and δL and δU are the lower and upper equivalence limits, respectively. References: 1. Schuirmann, DJ (1987). 2. Diletti, E, Hauschke, D. and Steinijans, VW (1991). 3. Owen, DB (1965). Normal Equivalence Trials: Two-Sample Test - Log Ratio of Means in Crossover Designs H0 : δ = ln(µt /µc ) <= δL Or δ = ln(µt /µc ) >= δU To determine by a log ratio metric whether the unknown mean µt under treatment is equal to the unknown mean µc under control for n subjects enrolled in a 2 × 2 crossover trial. Test statistics: (This test is performed as two separate α-level one-sided hypothesis t-tests) TL = µ̂c − µ̂t − δL q , TU = µ̂c − µ̂t − δU q , and M SE nr(1−r) M SE nr(1−r) where µ̂t and µˆc are the sample means in the treatment and control arm, respectively, and M SE is the mean squared error obtained by fitting a linear model to the crossover log data, all based on n observations. The assigned fraction r is the probability of being randomized to the treatment arm, and δL and δU are the lower and upper equivalence limits, respectively. References: 1. 2. 3. 4. R.3.4 Schuirmann, D.J. (1987). Hauschke, D, Kieser, M, Diletti, E and Burke, M (1998). Diletti, E, Hauschke, D and Steinijans, VW (1991). Owen, D.B. (1965). Wilcoxon Signed Rank Test Notation Ri : Rank of |Di | when absolute values are arranged in ascending order. I: is the indicator function. R.3 Continuous – R.3.4 Wilcoxon Signed Rank Test 2561 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Hypothesis: H0 : λ = 0 Test Statistic: W + = n X ∼ Ri I (Di > 0) 2 AN µW , σW (R.49) i=1 where, n (n + 1) . 4 (R.50) n (n + 1) (2n + 1) . 24 (R.51) µW = and 2 σW = R.3.5 Linear Regression Normal Superiority Trials: Linear Regression - Comparing Slope to Predefined Value Hypothesis : H0 : θt = θc Model: Given a response Yl and a covariate Xl ∼ N (µx , σx2 ) for subject l = 1, . . . , nj , consider the linear model: Yl = γ + θXl + εl , where all ε0l s are independent and identically distributed (i.i.d.) as N (0, σε2 ). Test statistic θ̂ − θ̂0 Z=q 2 , σ̂ε 2 nσ̂x where θ̂ is the estimated regression slope parameter, σ̂x2 is the sample variance of the covariate X in the sample, and σ̂ε2 is the sample error variance, all based on the n observations. References: 1. Dupont, WD and Plummer, WD, Jr. (1998). 2. Jennison, C and Turnbull, BW (2000). Normal Superiority Trials: Linear Regression - Comparing Two Slopes Hypothesis : H0 : θt = θc 2562 R.3 Continuous – R.3.5 Linear Regression <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Model: 2 ) for subject Given a response Yil and a covariate Xil ∼ N (µxi , σxi l = 1, . . . , nj submitted to treatment i = c, t, consider the linear model: Yil = γ + θi Xil + εil . where all ε0il s are i.i.d. N (0, σε2 ) Test statistic: θ̂t − θ̂c Z=r σ̂ε2 nt 1σ̂2 + xt 1 2 nc σ̂xc , 2 2 and σ̂xc where θ̂t and θ̂c are the estimated regression slope parameters and σ̂xt are the sample variances of the covariate X in the treatment and control arm, respectively, based on the nt and the nc observations, while σ̂ε2 is the sample error variance. References: 1. Dupont, WD and Plummer, WD, Jr (1998). 2. Jennison, C and Turnbull, BW (2000). Normal Superiority Trials: Repeated Measures Regression - Comparing Two Slopes Hypothesis : H0 : θt = θc where θt and θc are regression fixed slope parameters for two distinct population regressions using independent random samples of subject-specific repeated measures. Model: Given a final response Yiml and a prior series of repeated measurements on the response variable at times vm , m = 1, . . . , M for subject l = 1, . . . , n submitted to treatment i = c, t, consider the linear mixed effects model: Yiml = γi + θi vm + al + bl vm + εml , where the random effect (al , bl )0 is multivariate normal with mean (0, 0)0 and variance-covariance matrix: 2 σa σab G= , σab σb2 2 and all εml are i.i.d. N (0, σw ). R.3 Continuous – R.3.5 Linear Regression 2563 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Test statistic: Z = r θ̂t − θ̂c σ̂b2 + 2 12(M −1)σ̂w M (M +1)S 2 1 nt + 1 nc , where θ̂t and θ̂c are the estimated regression fixed slope parameters based on nt 2 and nc observations in the treatment and control arm, respectively, σ̂b2 and σ̂w are the between and within sample variances, respectively, M is the total number of measurements on each subject, and S is the follow-up time for each subject. References: 1. Fitzmaurice, GM, Laird, NM and Ware, JH (2004). 2. Jennison, C and Turnbull, BW (2000). R.4 Discrete R.4.1 Test for Proportion in One Sample Binomial R.4.2 McNemar’s Test for Paired Binomial R.4.1 Test for Proportion in One Sample Binomial Hypothesis : H0 : π = π0 to be tested against a two-sided alternative hypothesis H1 : π 6= π0 or a 0 one-sided alternative hypothesis H1 : π < π0 or H1 : π > π0 . In this analysis, the hypothesis is tested asymptotically as well as using Exact Inference. Asymptotic Inference Test Statistic: Using the variance estimated under the null hypothesis: π̂ − π0 Z=q , π0 (1−π0 ) n where π̂ is the sample proportion based on the n observations. East computes 1-sided and 2-sided asyptotic p-values using standard normal distribution of the test statistic Z. Also, confidence interval for the population proportion is derived for the specified value of confidence level. Exact Inference Suppose the data consist of t successes, and n − t failures, in n independent Bernoulli trials. Let π be the true underlying success rate. Then the outcome T = t has the Binomial probability n Pr(T = t|π) = π t (1 − π)n−t . (R.52) t 2564 R.4 Discrete – R.4.1 Test for Proportion in One Sample Binomial <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 East computes the maximum likelihood estimate of π as π̂ = t/n . Next, East computes a 100 × (1 − γ)% exact confidence interval for π using the method of Clopper and Pearson (1934). This method computes the interval in the form (π∗ (t), π ∗ (t)), where π∗ (t) is such that: π∗ (t) Pr(T ≥ t|π∗ (t)) = 0, if t = 0 γ = , if 0 < t ≤ n 2 (R.53) (R.54) and π ∗ (t) is such that: Pr(T ≤ t|π ∗ (t)) = π ∗ (t) = γ , if 0 ≤ t < n 2 1, if t = n . (R.55) (R.56) A unique and very useful option available in East is Casella’s procedure for computing confidence intervals (Casella, 1986). This procedure guarantees uniformly shorter exact confidence intervals than the commonly used Clopper-Pearson confidence intervals described above. In other words, for any value of n and any observed value of t, we will obtain shorter confidence intervals for π. The Casella procedure generalizes the technique of Blyth and Still (1983); in East we refer to these intervals as Blyth-Still-Casella intervals. To test the null hypothesis: H0 : π = π 0 , (R.57) East computes the following 1 and 2-sided p-values: p1 = min{Pr(T ≤ t|π0 ), Pr(T ≥ t|π0 )} , (R.58) p2 = 2 ∗ p1 . (R.59) East also computes the power against the alternative hypothesis: H1 : π = π1 (π1 > π0 ) . (R.60) Let α be the probability of a Type I error and t0 be the smallest integer such that: Pr(T ≥ t0 |π0 ) ≤ α . R.4 Discrete – R.4.1 Test for Proportion in One Sample Binomial (R.61) 2565 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Then, the exact (one-sided) power is given by: 1 − β = Pr(T ≥ t0 |π1 ) . (R.62) 1 − β = Pr(T ≤ t0 |π1 ) , (R.63) If π1 < π0 , where t0 is the largest integer for which Pr(T ≤ t0 |π0 ) ≤ α . R.4.2 (R.64) McNemar’s Test for Paired Binomial Suppose that two binomial responses are observed on each of N pairs. Let y11 be the count of the number of individuals whose first and second responses are both positive. Let y22 be the count of the number of individuals whose first and second responses are both negative. Let y12 be the count of the number of individuals whose first response is positive and whose second response is negative. Finally let y21 be the count of the number of individuals whose first response is negative and whose second response is positive. Then McNemar’s test is defined on a single 2 × 2 table of the form y= y11 y21 y12 . y22 Let (π11 , π12 , π21 , π22 ), denote the four cell probabilities for this table. The null hypothesis of interest is: H0 : π12 = π21 . verus H1 : π12 6= π21 . McNemar’s statistic only depends on the values of the off-diagonal elements of the 2 × 2 table. The Test Statistic is: M C(y) = y12 − y21 . (R.65) Let y represent any generic 2 × 2 contingency table and suppose that x is the 2 × 2 table actually observed. The exact permutation distribution of the test statistic (R.65) is obtained by conditioning on the observed sum of off-diagonal terms, or “discordant pairs”, Nd = y12 + y21 (R.66) We define the reference set by Γ = {y: y is 2 × 2; y12 + y21 = Nd } . 2566 R.4 Discrete – R.4.2 McNemar’s Test for Paired Binomial (R.67) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Given π1 , π1 + π2 We see that evaluation of H0 versus H1 is equivalent to testing µ= 0 H0 : µ = 0.5 versus 0 H1 : µ 6= 0.5 . (R.68) (R.69) (R.70) The conditional probability P (y) of observing any y ∈ Γ is binomial with parameters (µ, Nd ). Thus Nd P (y) = µy12 (1 − µ)Nd −y12 , (R.71) y12 which reduces under (R.69) to P (y) = (0.5)Nd Nd ! . y12 !y21 ! (R.72) Hence, under the null hypothesis the probability that McNemar’s statistic equals or exceeds its observed value M C(x) is readily evaluated as X Pr(M C(Y) ≥ MC(x)) = P(Y) , (R.73) MC(Y)≥MC(x) the sum being taken over all y ∈ Γ. The probability that McNemar’s statistic is less than or equal to M C(x) is similarly obtained. The exact one-sided p-value is then defined as p1 = min{Pr(M C(Y) ≤ MC(x)), Pr(MC(Y) ≥ MC(x))} (R.74) We can show that the exact distribution of the test statistic MC(Y) is symmetric about 0. Therefore the exact two-sided p-value is defined as double the exact one-sided p-value: p2 = 2p1 . (R.75) In large samples, the standardized test statistic (which we report in the output for both exact and asymptotic options) M C ∗ (y) = y12 − y21 √ Nd R.4 Discrete – R.4.2 McNemar’s Test for Paired Binomial (R.76) 2567 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis is asymptotically normally distributed with zero mean and unit variance. The 1-sided asymptotic p-value is defined as: p̃1 = min{Φ(M C ∗ (x)), (1 − Φ(M C ∗ (x)))} (R.77) where Φ(z) is the left tail of the standard normal distribution at z, and x is the observed 2 × 2 contingency table. The 2-sided asymptotic p-value is double the 1-sided asymptotic p-value. The confidence interval is obtained for the difference of proportions based on the asymptotic distribution. R.5 Two Independent Binomials R.5.1 Exact Superiority Test:Diff R.5.2 Exact Noninferiority Test :Diff R.5.3 Exact Equivalence Test:Diff R.5.4 Exact CI for Diff of Prop R.5.5 Exact CI for Ratio of Prop R.5.6 Exact Noninferiority Test: Ratio R.5.7 CI for Binomial Ratio R.5.8 Restricted Nuisance Parameter Range R.5.9 Noninferiority:Odds Ratio of Proportions R.5.10 Common Odds Ratio for Stratified 2x2 Tables R.5.11 Fisher’s Exact Test 2568 R.5.1 Exact Unconditional Test of Superiority : Difference of Proportions This section presents the statistical theory underlying Exact unconditional inference for data sampled from two independent binomial populations. Although the problems we will discuss are commonly encountered, the underlying theory is not easily accessible elsewhere. Consider a randomized clinical trial comparing an experimental treatment T, to a control treatment C, on the basis of a binomially distributed outcome variable, X, with probability of success πt and πc respectively. Consider the data presented in the 2 × 2 contingency table coming from control and treatment arm, x, displayed in Table R.1: Table R.1: The Observed 2x2 Contingency Table, x. Response Success Failure Col Total Population C x1c x2c nc Population T x1t x2t nt Row Total m1 m2 N The two columns of Table R.1 arise from two independent binomial populations. In the first column for control arm, there are x1c successes and x2c failures in nc independent Bernoulli trials, each with probability πc of success. Second column corresponds to data on the treatment arm. The sum of successes from the two arms is m1 = x1c + x1t . The sample sizes nc and nt are number of observations on control and treatment arm. Define the difference in proportions between treatment group and control group to be δ = πt − πc . The null hypothesis of interest is H0 : δ = 0 which is tested against a 2-sided alternative hypothesis H1 : δ 6= 0 or a 1-sided alternative 0 hypothesis H1 : δ > 0 or H1 : δ < 0 as the case maybe. Let πˆt and πˆc be the sample R.5 Two Independent Binomials – R.5.1 Exact Superiority Test:Diff <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 proportions based on nt and nc observations in the treatment and control arm respectively. Then the estimate of δ is δ̂ = πˆt − πˆc . Asymptotic Inference Test statistic is defined as: Z = r πˆt − πˆc . x1c + x1t x2c + x2t 1 1 + nt nc N N (R.78) Z is distributed as variable that follows N (0, 1) distribution under the null hypothesis. Exact Unconditional Inference Suppose that H0 is true and let the common probability of success for the two binomial populations be πc = πt = π. Then the probability of observing the data in Table R.1 is a product of two binomial probabilities, denoted by nc nt f0 (x) = π x1c +x1t (1 − π)x2c +x2t . (R.79) x1c x1t The p-value is defined to be the probability, under H0 , of obtaining a 2 × 2 table at least as extreme as the observed table, x. Before we can compute this p-value, however, we need to answer two questions: 1. What criterion should we use to establish that a 2 × 2 contingency table is at least as extreme as x? 2. What is the exact null probability of each of these extreme 2 × 2 contingency tables? To answer these questions we must introduce some more notation. Let Y denote any generic 2 × 2 table that can arise if we take two independent samples, one of size nc from binomial population C and the other of size nt from binomial population T. Such a generic 2 × 2 table is displayed below in Table R.3 Table R.2: Any Generic 2x2 Contingency Table, Y Response Success Failure Col Total Control y1c y2c nc Treatment y1t y2t nt Row Total y1c + y1t y2c + y2t N R.5 Two Independent Binomials – R.5.1 Exact Superiority Test:Diff 2569 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis The probability of observing this table is f0 (Y) which, as shown by equation (R.79), contains an unknown (nuisance) parameter, π. So long as the probability of observing any generic 2 × 2 table depends on π, exact inference is not possible, since the p-value is based on summing up the probabilities of many such tables, each depending on an unknown parameter. The key to exact inference is getting rid of π, the nuisance parameter. The unconditional approach is to eliminate π by taking a supremum over its entire range so as to provide for the worst-case. (Barnard, 1945, was the first to propose this idea.) The unconditional probability of observing x under H0 is f0 (x), specified by equation (R.79). In order to compute an exact p-value we need to specify a reference set of 2 × 2 contingency tables and sum the probabilities of tables that are at least as extreme as x in it. Unconditional inference uses reference set of 2 × 2 contingency tables in which only the column sums, or the binomial sample sizes, are fixed. The row sums are treated as random variables. Denote this reference set by Ω = {Y: y1j + y2j = nj , j = c, t} , (R.80) and order each table Y ∈ Ω according to the test statistic π̂t − π̂c D(Y) = q y1c +y1t +y2t ( N )( y2cN )( n1c + , 1 nt ) (R.81) where π̂j = y1j /nj , j = c, t. If y11 = y12 = 0, or y21 = y22 = 0, set D(Y) = 0. The denominator of (R.81) is the standard error of the observed difference of binomial proportions under the null hypothesis. Therefore the statistic D(Y) has a mean of 0 and variance of 1 under H0 . A large positive value for the observed statistic D(x) furnishes evidence against H1 while a large negative value furnishes evidence against H10 . The exact p-value is the sum of probabilities of all tables Y ∈ Ω that are more extreme than the observed table x with respect to the test statistic (R.81). The trouble is that each such extreme table has a probability f0 (Y) which, by equation (R.79) depends on the unknown nuisance parameter, π. We compute the p-value in two stages. At the first stage we express the p-value as a function of π. Then, at the second stage, we obtain the supremum of this function over all values of π ∈ (0, 1). We use this supremum as the p-value. Since the p-value based on the actual value of π can never exceed the supremum over all possible values of π, this procedure guarantees that the type-1 error will always be preserved. In effect we compute a conservative p-value that will preserve the desired type-1 error rate no matter what the true value of π might be, since it is designed to cater for the worst case. Specifically, the exact one-sided p-value given π is computed as X X p1 (π) = min f0 (Y), f0 (Y) . (R.82) D(Y)≤D(x) 2570 D(Y)≥D(x) R.5 Two Independent Binomials – R.5.1 Exact Superiority Test:Diff <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The exact two-sided p-value given π is computed as X p2 (π) = f0 (Y) . (R.83) |D(Y)|≥|D(x)| Finally we obtain one and two-sided p-values that are independent of π by taking a supremum over all possible values of π and arguing that even in the worst possible case, the true p-value could never exceed the supremum. Thus p1 = sup{p1 (π): 0 ≤ π ≤ 1} (R.84) p2 = sup{p2 (π): 0 ≤ π ≤ 1} . (R.85) and R.5.2 Exact Test of Noninferiority:Difference of Proportions An important biomedical application arises in so-called “active control” clinical trials. In these studies the goal is to demonstrate the noninferiority rather than the superiority of the new treatment relative to the active control. Define the difference in proportions δ = πt − πc . (R.86) In a noninferiority clinical trial the objective is not to demonstrate that the experimental treatment is superior to the control but rather to demonstrate that the experimental treatment is not significantly inferior. Accordingly a noninferiority margin, δ0 > 0, is specified a priori and we test the null hypothesis of inferiority. H0 : δ ≥ δ 0 (R.87) versus the one sided alternative hypothesis of noninferiority H1 : δ < δ 0 . (R.88) The test is carried out under the assumption that δ is at its threshold null value δ = δ0 . When δ0 < 0, East tests the null hypothesis H0 : δ ≤ δ0 against the alternative hypothesis H1 : δ > δ0 . When δ0 > 0, the null hypothesis H0 : δ ≥ δ0 is tested against the alternative hypothesis H1 : δ < δ0 . Let πˆt and πˆc be the sample proportions based on nt and nc observations in the treatment and control arm. Then the estimate of δ is δ̂ = πˆt − πˆc . Test statistics for Wald test and Score Test are defined as follows: R.5 Two Independent Binomials – R.5.2 Exact Noninferiority Test :Diff 2571 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Noninferiority (WALD) Z=r πˆt − πˆc − δ0 ∼ N (0, 1) (R.89) πˆc (1 − πˆc ) πˆt (1 − πˆt ) + nc nt Noninferiority (Score) Z=q πˆt − πˆc − δ0 π˜c (1−π˜c ) + π˜t (1−π˜t ) nc nt ∼ N (0, 1) (R.90) where π˜t and π˜c are the restricted mle’s of πt and πc as suggested by Mittinen and Nurminen(1985)whereas the test statistic has been recommended by Farrington and Manning(1990). Z is distributed as variable that follows N (0, 1) distribution under the null hypothesis. Exact Inference Let Y ∈ Ω denote any generic 2 × 2 table of the form of Table R.3 that might be observed if we generated nc independent bernoulli trials each with probability πc and nt independent bernoulli trials each with probability πt . The probability of observing any Y ∈ Ω under H0 is nc nt fπc ,δ0 (y) = πcy1c (1 − πc )y2c (πc + δ0 )y1t (1 − πc − δ0 )y2t (R.91) y1c y1t The test statistic (see Chan, 1998) is defined as D(Y) = q π̂t − π̂c − δ0 (π̃c )(1−π̃c ) nc where π̂j = + (π̃t )(1−π̃t ) nt y1j , nj (R.92) (R.93) for j = c, t, and π̃c and π̃t are the maximum likelihood estimates of πc and πt , respectively, restricted under the null hypothesis so as to satisfy the requirement π̃t − π̃c = δ0 . Miettinen and Nurminen (1985) have shown that one may obtain these restricted maximum likelihood estimates by solving the third degree likelihood equation 3 X Lk π̃ck = 0 (R.94) k=1 2572 R.5 Two Independent Binomials – R.5.2 Exact Noninferiority Test :Diff <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 for π̃c and setting π̃t = π̃c + δ0 , where L3 = N , L2 = (nt + 2nc )δ0 − N − y1c − y1t , L1 = (nc δ − N − 2y1c )δ0 + y1c + y1t , L0 = y1c δ0 (1 − δ0 ) . The test statistic (R.92) is known as the score statistic. Under H0 this test statistic has mean 0 and variance 1. Let the data in Table R.1, denoted by x, be the 2 × 2 table actually observed. Then the observed value of the test statistic is D(x), and the left tail of the distribution of D(Y) at its observed value under H0 is X Pπc ,δ0 (D(x)) = fπc ,δ0 (Y) (R.95) D(Y)≤D(x) If we knew the value of πc , then Pπc ,δ0 (D(x)) would be the exact p-value for testing H0 versus H1 . Since πc is unknown, however, we take the supremum of (R.95) over all values of πc in its range, just as we did for Barnard’s test in Section R.5.1. This produces a conservative p-value that is guaranteed to ensure that the true type-1 error of the test will never exceed its nominal significance level. Since δ0 > 0 the range of possible values for πc is I(δ0 ) = {πc : 0 < πc < 1 − δ0 } . (R.96) Thereupon the unconditional exact one-sided p-value is p1 ≡ Pδ0 (D(x)) = sup{Pπc ,δ0 (D(x)) : πc ∈ I(δ0 )} . (R.97) Note that in practice the supremum in equation (R.97) is taken over a restricted range for π rather than over the entire range I(δ0 ). This restriction, proposed by Berger and Boos (1994), adds stability and reduces the conservatism of the procedure. The p-values are suitably adjusted so that the restricted search for the supremum does not compromise the type-1 error. Finally, it is worth noting that when δ0 = 0 the above p-value specializes to the left tail p-value obtained by Barnard’s test. Additional Remarks 1. The score statistic D(Y) specified by equation (R.92) is always defined except for the special case where y1c = y1t = 0 and δ0 = 0 or where y2c = y2t = 0 and δ0 = 0. For these special cases the one- and two-sided p-values are both set to 1.These special cases never arise when performing a noninferiority test with δ0 > 0. R.5 Two Independent Binomials – R.5.2 Exact Noninferiority Test :Diff 2573 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis 2. The one-sided asymptotic p-value corresponding to p1 is obtained by assuming that the test statistic D(Y) converges in distribution to the standard normal. Thus p̃1 = 1 − Φ(D(x)) . (R.98) 3. An alternative equivalent way to perform a level-α test of non-inferiority is to compute an exact 100 × (1 − α) lower confidence bound for δ, say δL , using the method described in Section R.5.4. If δL < δ0 we reject the null hypothesis of inferiority. R.5.3 Exact Test of Equivalence: Difference of Proportions Suppose πc is the response rate of Control and πt is the response rate of Treatment. Define the absolute difference in proportions δ = |πt − πc | . (R.99) Suppose that for a pre-specified equivalence margin δ0 > 0 we wish to test the null hypothesis of inequivalence H0 : δ ≥ δ 0 (R.100) against the alternative hypothesis of equivalence H1 : δ < δ 0 . (R.101) We test the above null hypothesis by performing two separate one-sided non-inferiority hypothesis tests of the form H01 : πc − πt ≥ δ0 versus H11 : πc − πt < δ0 (R.102) H02 : πt − πc ≥ δ0 versus H12 : πt − πc < δ0 . (R.103) and Each hypothesis test is carried out separately using the method described in Section R.5.2. Hypothesis test H01 is performed under the assumption that πc − πt is at its threshold null value πc − πt = δ0 . Similarly hypothesis test H02 is tested under the assumption that πt − πc is at its threshold null value πt − πc = δ0 .We reject the null hypothesis of inequivalence and accept the alternative hypothesis of equivalence only if both H01 and H02 are rejected. The probability of observing any Y ∈ Ω under H01 is nc nt fπ01c ,δ0 (Y) = πcy1c (1 − πc )y2c (πc − δ0 )y1t (1 − πc + δ0 )y2t , y1c y1t (R.104) 2574 R.5 Two Independent Binomials – R.5.3 Exact Equivalence Test:Diff <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 and the statistic used to test H01 is D01 (Y) = q π̂c − π̂t − δ0 (π̃c )(1−π̃c ) nt + (π̃t )(1−π̃t ) nt , (R.105) where the π̃c and π̃t are restricted maximum likelihood estimates of πc and πt , respectively, under the restriction that π̃c − π̃t = δ0 . We compute X Pπ01c ,δ0 (D(x)) = fπ01 (Y) , (R.106) c ,δ0 D01 (Y)≤D01 (x) and then take the supremum over all πc ∈ I 01 (δ0 ) where I (01) (δ0 ) = {πc : δ0 < πc < 1} . (R.107) The exact unconditional one-sided p-value for testing H01 is thus 01 p01 ≡ Pδ01 (D(x)) = sup{P01 πc ,δ0 (D(x)) : πc ∈ I (δ0 )} . 0 (R.108) The probability of observing any Y ∈ Ω under H02 is nc nt 02 fπc ,δ0 (Y) = πcy1c (1 − πc )y2c (πc + δ0 )y1t (1 − πc − δ0 )y2t , y1c y1t (R.109) and the statistic used to test H02 is D02 (Y) = q π̂t − π̂c − δ0 (π̃c )(1−π̃c ) nc + (π̃t )(1−π̃t ) nt , (R.110) where the π̃c and π̃t are maximum likelihood estimates of πc and πt , respectively, under the restriction that π̃t − π̃c = δ0 . We compute X Pπ02c ,δ0 (D(x)) = fπ02 (Y) , (R.111) c ,δ0 D02 (Y)≤D02 (x) and then take the supremum over all πc ∈ I 02 (δ0 ) where I (02) (δ0 ) = {πc : 0 < πc < 1 − δ0 )} . (R.112) The exact unconditional one-sided p-value for testing H02 is thus 02 p02 ≡ Pδ02 (D(x)) = sup{P02 πc ,δ0 (D(x)) : πc ∈ I (δ0 )} . 0 (R.113) Additional Remarks R.5 Two Independent Binomials – R.5.3 Exact Equivalence Test:Diff 2575 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis 1. The test statistics D01 (Y) and D02 (Y), specified by equations (R.105) and (R.110), respectively, are always defined except for the special cases where y11 = y12 = 0 and δ0 = 0 or where y21 = y22 = 0 and δ0 = 0. For these special case the one- and two-sided p-values are both set to 1.These special cases never arise when performing an equivalence test with δ0 > 0. 2. The one-sided asymptotic p-value corresponding to p01 is obtained by assuming that the test statistic D01 (Y) converges in distribution to the standard normal. Thus, p̃01 = 1 − Φ(D01 (x)) . (R.114) 3. The one-sided asymptotic p-value corresponding to p02 is obtained by assuming that the test statistic D02 (Y) converges in distribution to the standard normal. Thus p̃02 = 1 − Φ(D02 (x)) . (R.115) 4. An alternative equivalent way to perform a level-α test of equivalence is to compute an exact 100 × (1 − 2α) confidence interval for δ, say (δL , δU ), using the method described in Section R.5.4. If δ0 is excluded from this interval, we reject the null hypothesis of equivalence. R.5.4 Unconditional Exact Confidence Intervals for the Difference of Proportions Suppose πc is the binomial response rate of Control and πt is the binomial response rate of Treatment. We wish to compute an exact 100(1 − α)% confidence interval for δ = πt − πc . We use a test based procedure. That is, we invert hypothesis tests of the form δ = δ0 , where, in general, δ0 6= 0. If we are dealing with the superiority, δ0 will be zero. In case of noninferiority δ0 is nonzero. Accordingly, this section is applicable to superiority, noninferiority and equivalence. There is one further complication, however, since the p-values which we compute under these alternative hypotheses depend on a nuisance parameter. We handle this problem the same way we handled it for Barnard’s unconditional exact hypothesis test; i.e., by taking a supremum over all possible values of the nuisance parameter. Interval Estimation 2576 R.5 Two Independent Binomials – R.5.4 Exact CI for Diff of Prop <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Suppose we take nc independent Bernoulli samples from control and nt independent Bernoulli samples from treatment. Let Y ∈ Ω (see Table R.3 denote any generic 2 × 2 table that might be observed, and let x (see Table R.1), be the 2 × 2 table that was actually observed. Define y1j π̂j = nj for j = c, t. In East, we provide a test based exact confidence interval using the standardized statistic D(Y) = q π̂t − π̂c − δ0 (π̃c )(1−π̃c ) nc + (π̃t )(1−π̃t ) nt (R.116) where π̃c and π̃t are the maximum likelihood estimates of πc and πt computed, under the restriction that π̃t − π̃c = δ0 . This statistic is known as the score statistic. The use of (R.116) as the test statistic has been proposed by Farrington and Manning(1990) for asymptotic confidence intervals and by Chan and Zhang (1999) for exact confidence intervals. We note that the score statistic specified by equation (R.116) is always defined except for the special cases where y1c = y1t = 0 and δ0 = 0, or y2c = y2t = 0 and δ = 0. These special cases never arise when computing a confidence interval. Test Based Exact Confidence Intervals: Inverting Two One-Sided Tests Let (δ∗ , δ ∗ ) be the desired 100(1 − α)% exact confidence interval, evaluated at D(x), the observed value of the test statistic. This exact confidence interval may be constructed by inverting two one-sided hypothesis tests, each at the α/2 significance level, under appropriate alternative hypotheses about δ. The probability of observing any Y ∈ Ω, for any given value of δ, is nc nt fπc ,δ (Y) = πcy1c (1 − πc )y2c (πc + δ)y1t (1 − πc − δ)y2t . (R.117) y1c y1t Define Pπc ,δ (D(x)) = X fπc ,δ (Y) (R.118) fπc ,δ (Y) . (R.119) D(Y)≤D(x) and Qπc ,δ (D(x)) = X D(Y)≥D(x) We must eliminate the nuisance parameter πc from equations (R.118) and (R.119) by taking the supremum over its range. It is easy to see that the permissible range for πc given δ is the interval I(δ) = {πc : max(0, −δ) ≤ πc ≤ min(1, 1 − δ)} . R.5 Two Independent Binomials – R.5.4 Exact CI for Diff of Prop (R.120) 2577 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Thus we define Pδ (D(x)) = sup{Pπc ,δ (D(x)): πc ∈ I(δ)} (R.121) Qδ (D(x)) = sup{Qπc ,δ (D(x)): πc ∈ I(δ)} . (R.122) and Starting with δ = −1, the desired lower confidence bound is obtained by increasing the value of δ until we find a value, denoted by δ∗ , such that the equality Qδ∗ (D(x)) = α/2 (R.123) is satisfied but for any δ < δ∗ , Qδ (D(x)) < α/2. The upper confidence bound, δ ∗ is obtained in an analogous fashion. Starting with δ = 1, the desired upper confidence bound is obtained by decreasing the value of δ until we find a value, denoted by δ ∗ , such that the equality Pδ∗ (D(x)) = α/2 (R.124) is satisfied but for any δ > δ ∗ , Pδ (D(x)) < α/2. East reports (δ∗ , δ ∗ ) as the 100 × (1 − α)% confidence interval for the parameter δ. Suppose δ0 is the true (unknown) value of δ. The long run relative frequency with which, in repeated trials, this interval excludes δ0 is Pr(δ0 ≤ δ∗ ) + Pr(δ0 ≥ δ ∗ ). We shall show at the end of this Section that neither term in the above sum can exceed α/2. Therefore the probability of the confidence interval excluding δ0 cannot exceed α. However, due to the discreteness of the distribution of D(Y), and the conservatism induced by taking a supremum over all πc ∈ I(δ), the above exclusion probability is usually less than α instead of equaling α. Thus, (δ∗ , δ ∗ ) may be regarded as a conservative confidence interval. In addition to the exact confidence interval, East also reports an exact one-sided p-value, defined as the smaller of the two tail areas, p1 = min(P0 (D(x)), Q0 (D(x)) , (R.125) and the two sided exact p-value is twice the one-sided: p2 = 2p1 . (R.126) The two-sided p-value is weakly consistent with the corresponding exact confidence interval for δ. That is, if 0 6∈ [δ∗ , δ ∗ ] then pt < α. The stronger consistency requirement, that pt < α if and only if 0 ∈ / [δ∗ , δ ∗ ] cannot be established unless Pδ (D(x)) and Qδ (D(x)) are monotone functions of δ for any given D(x). This need not be the case, however. The above procedure can be slightly modified in practice. The suprema in equations (R.121) and (R.122) can be taken over a restricted range for π rather than 2578 R.5 Two Independent Binomials – R.5.4 Exact CI for Diff of Prop <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 over the entire range I(δ). This restriction, proposed by Berger and Boos (1994), adds stability and reduces the conservatism of the procedure. The right hand sides of equations (R.123) and (R.124) are suitably adjusted so that the restricted search for the supremum does not compromise the coverage properties of the resulting confidence interval. Proof of Coverage: We shall now prove that the probability that the above confidence interval excludes the true parameter δ0 cannot exceed α. For simplicity, denote the random variable D(Y) by D, and its observed value D(x) by d. In order to make explicit the dependence of the confidence interval on d, denote the lower confidence bound by δ∗ (d) to and the upper confidence bound by δ ∗ (d). Thus the lower confidence bound satisfies the relationship Qδ∗ (d) (D(x)) = α/2 , (R.127) and furthermore, by the way we conduct the search for δ∗ (d), Qδ (D(x)) < α/2, if δ < δ∗ (d) . (R.128) Define H(δ0 ) to be the smallest value of D satisfying the inequality Qδ0 (H(δ0 )) ≤ α/2 . (R.129) Observe, from the definition of H(δ0 ) in (R.129), that if d < H(δ0 ) we must have Qδ0 (d) > α/2. But we know from (R.128) that there is no value of δ ≤ δ∗ (d) for which Qδ (d) > α/2. Therefore if d < H(δ0 ), it must be the case that δ0 > δ∗ (d). It follows that Pr{δ0 > δ∗ (d)} ≥ Pr{D < H(δ0 )|δ0 , π1 } . (R.130) We use a weak inequality instead of a strict equality in (R.130) because it is also possible in some situations to have δ0 > δ∗ (d) when d ≥ H(δ0 ). Taking the complementary probability on both sides of (R.130) we have Pr{δ0 ≤ δ∗ (d)} ≤ Pr{D ≥ H(δ0 )|δ0 , πc } . (R.131) Taking the supremum over all πc ∈ I(δ0 ) on the right hand side of (R.131) we have Pr{δ∗ ≥ δ0 } ≤ Qδ0 (H(δ0 )) ≤ α/2 . (R.132) By an analogous argument we can establish that Pr{δ ∗ ≤ δ0 } ≤ α/2 . (R.133) Therefore the probability that the interval (δ∗ , δ ∗ ) excludes the parameter δ0 cannot exceed α. R.5 Two Independent Binomials – R.5.4 Exact CI for Diff of Prop 2579 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Asymptotic Confidence Interval East computes asymptotic p-values and test based asymptotic confidence intervals for δ, under the assumption that the test statistic is asymptotically normally distributed. The asymptotic 100 × (1 − α)% confidence interval (δ̃∗ , δ̃ ∗ ) is obtained by inverting the corresponding one-sided hypothesis tests. Thus δ̃∗ satisfies the equality x /n − x /n − δ 1t t 1c c ∗ 1−Φ q = α/2 , (R.134) (π̃c )(1−π̃c ) (π̃t )(1−π̃t ) + nc nt where π̃c and π̃t are the maximum likelihood estimates of πc and πt , respectively, under the null hypothesis that πt − πc = δ∗ . Similarly δ̃ ∗ satisfies the equality x /n − x /n − δ ∗ 1t t 1c c Φ q = α/2 , (R.135) (π̃c )(1−π̃c ) (π̃t )(1−π̃t ) + nc nt where π̃c and π̃t are the maximum likelihood estimates of πc and πt , respectively, under the null hypothesis that πt − πc = δ ∗ . R.5.5 Unconditional Exact Confidence Intervals for the Ratio of Proportions In the Ratio of Proportions test, let πt and πc denote the proportions of the successes from the experimental treatment (T) and the control treatment (C), respectively. To test the null hypothesis H0 : πt /πc = 1 against the 2-sided alternative hypothesis 0 H1 : πt /πc 6= 1 or a 1-sided alternative hypothesis H1 : πt /πc < 1 or H1 : πt /πc > 1. Test Statistic Using the pooled estimate of variance: ln(π̂t ) − ln(π̂c ) Z=r , (1−π̂) 1 1 + π̂ nt nc where π̂ = nt π̂t + nc π̂c , nt + nc where π̂t and π̂c are the sample proportions based on nt and nc observations in the treatment and control arm, respectively. Asymptotically, Z is distributed as variable that follows N (0, 1) distribution under the null hypothesis. We wish to compute an exact 100(1 − α)% confidence interval for ρ= 2580 πt . πc R.5 Two Independent Binomials – R.5.5 Exact CI for Ratio of Prop (R.136) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The procedure parallels that described in Section R.5.4 for the difference of proportions, δ. We invert hypothesis tests of the form ρ = ρ0 , where, in general, ρ0 6= 1. There is one further complication, however, since the p-values which we compute under these alternative hypotheses depend on a nuisance parameter. We handle this problem the same way we handled it for Barnard’s unconditional exact hypothesis test; by taking a supremum over all possible values of the nuisance parameter. Suppose we take nc independent Bernoulli samples from control and nt independent Bernoulli samples from treatment. Let Y ∈ Ω (see Table R.3, page 2596) denote any generic 2 × 2 table that might be observed, and let x (see Table R.1, page 2568), be the 2 × 2 table that was actually observed. Define π̂j = y1j nj (R.137) for j = c, t. In East we provide a test based exact confidence interval using the standardized statistic. D(Y) = q π̂t − ρ0 π̂c (π̃t )(1−π̃t ) nt + ρ2 0 (π̃c )(1−π̃c ) nc (R.138) where π̃c and π̃t are the restricted maximum likelihood estimates of πc and πt computed, under the restriction that π̃t /π̃c = ρ0 . The restricted MLE’s are suggested by Miettinen and Nurminen (1985). The use of (R.164) as the test statistic has been proposed by Farrington and Manning (1990) for asymptotic confidence intervals and by Chan and Zhang (1999) for exact confidence intervals. Test Based Exact Confidence Intervals: Inverting Two One-Sided Tests Let (ρ∗ , ρ∗ ) be the desired 100(1 − α)% exact confidence interval, evaluated at D(x), the observed value of the test statistic. This exact confidence interval may be constructed by inverting two one-sided hypothesis tests, each at the α/2 significance level, under appropriate alternative hypotheses about ρ. The computations are very similar to the p-value computations performed in the next Section. The probability of observing any Y ∈ Ω for any given value of ρ is nc nt fπc ,ρ (Y) = πcy1c (1 − πc )y2c (ρπc )y1t (1 − ρπc )y2t . (R.139) y1c y1t Define Pπc ,ρ (D(x)) = X fρπc (Y) (R.140) fρπc (Y) . (R.141) D(Y)≤D(x) and Qπc ,ρ (D(x)) = X D(Y)≥D(x) R.5 Two Independent Binomials – R.5.5 Exact CI for Ratio of Prop 2581 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis We must eliminate the nuisance parameter πc from equations (R.166) and (R.167) by taking the supremum over its range. It is easy to see that the permissible range for πc given ρ is the interval I(ρ) = {πc : 0 ≤ πc ≤ min(1/ρ, 1)} . (R.142) Pρ (D(x)) = sup{Pπc ,ρ (D(x)): πc ∈ I(ρ)} (R.143) Qρ (D(x)) = sup{Qπc ,ρ (D(x)): πc ∈ I(ρ)} . (R.144) Thus we define and Starting with ρ = 0, the desired lower confidence bound is obtained by increasing the value of ρ until we find a value, denoted by ρ∗ , such that the equality Qρ∗ (D(x)) = α/2 (R.145) is satisfied but for any ρ < ρ∗ , Qρ (D(x)) < α/2. The upper confidence bound, ρ∗ is obtained in an analogous fashion. Starting with ρ = ∞ (i.e., a very large positive number), the desired upper confidence bound is obtained by decreasing the value of ρ until we find a value, denoted by ρ∗ , such that the equality Pρ∗ (D(x)) = α/2 (R.146) is satisfied but for any ρ > ρ∗ , Pρ (D(x)) < α/2. East reports (ρ∗ , ρ∗ ) as the 100 × (1 − α)% confidence interval for the parameter ρ. Suppose ρ0 is the true (unknown) value of ρ. The long run relative frequency with which, in repeated trials, this interval excludes ρ0 is Pr(ρ0 ≤ ρ∗ ) + Pr(ρ0 ≥ ρ∗ ). Using arguments similar to those given on page 2579 for the binomial difference, δ0 , we can show that neither term in the above sum can exceed α/2. Therefore the probability of the confidence interval excluding ρ0 cannot exceed α. However, due to the discreteness of the distribution of D(Y), and the conservatism induced by taking a supremum over all πc ∈ I(ρ), the above exclusion probability is usually less than α instead of equaling α. Thus, (ρ∗ , ρ∗ ) may be regarded as a conservative confidence interval. In addition to the exact confidence interval, East also reports an exact one-sided p-value, defined as the smaller of the two tail areas, pc = min(P0 (D(x)), Q0 (D(x)) , (R.147) and the two sided exact p-value is twice the one-sided: pt = 2pc . (R.148) The two-sided p-value is weakly consistent with the corresponding exact confidence interval for ρ. That is, if 1 6∈ [ρ∗ , ρ∗ ] then pt < α. The stronger consistency 2582 R.5 Two Independent Binomials – R.5.6 Exact Test of Noninferiority:Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 requirement, that pt < α if and only if 1 ∈ / [ρ∗ , ρ∗ ] cannot be established unless Pρ (D(x)) and Qρ (D(x)) are monotone functions of ρ for any given D(x). This need not be the case, however. R.5.6 Exact Test of Noninferiority:Ratio of Proportions Suppose πc is the response rate of an experimental treatment and πt is the response rate of an active control treatment. Define the ratio of binomial proportions ρ= πt . πc (R.149) In a non-inferiority clinical trial the objective is not to demonstrate that the experimental treatment is superior to the control but rather to demonstrate that the experimental treatment is not significantly inferior. Accordingly a non-inferiority margin, ρ0 , is specified a priori and we test the null hypothesis of inferiority H0 : ρ ≤ ρ0 against H1 : ρ > ρ0 if ρ0 < 1 Or H0 : ρ ≥ ρ0 against H1 : ρ < ρ0 if ρ0 > 1 Test statistic: (Wald) Z= ln(π̂t ) − ln(π̂c ) − ln(ρ0 ) q , (1−π̂c ) (1−π̂t ) nt π̂t + nc π̂c where π̂t and π̂c are the sample proportions based on nt and nc observations in the treatment and control arm, respectively. δ0 = ln(ρ0 ) is the noninferiority margin. Under Ho, Z follows asymptotic Normal distribution with mean 0 and variance 1. Only asymptotic inference is available with Wald test. Test statistic:(Farrington Manning) Z=q π̂t − ρ0 π̂c π̃t (1−π̃t ) nt + ρ20 π̃c (1−π̃c ) nc , where π̂t and π̂c are the sample proportions based on nt and nc observations in the treatment and control arm, respectively and π̃c and π̃t are the restricted maximum likelihood estimates of πc and πt , respectively. The test is carried out under the assumption that ρ is at its threshold null value ρ = ρ0 . Exact Inference Let Y ∈ Ω denote any generic 2 × 2 table of the form of Table R.3 that might be observed if we generated nc independent bernoulli trials each with probability πc and R.5 Two Independent Binomials – R.5.6 Exact Noninferiority Test: Ratio 2583 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis nt independent bernoulli trials each with probability πt . The probability of observing any Y ∈ Ω under H0 is nc nt fπc ,ρ0 (Y) = πcy1c (1 − πc )y2c (ρ0 πc )y1t (1 − ρ0 πc )y2t . (R.150) y1c y1t The test statistic (see Farrington and Manning(1990) is defined as D(Y) = q π̂t − ρ0 π̂c (π̃t )(1−π̃t ) nt where π̂j = + ρ2 0 (π̃c )(1−π̃c ) nc y1j , nj (R.151) (R.152) for j = c, t, and π̃c and π̃t are the maximum likelihood estimates of πc and πt , respectively, restricted under the null hypothesis to satisfy the requirement that π̃t /π̃c = ρ0 . Miettinen and Nurminen (1985) have shown that one may obtain these restricted maximum likelihood estimates by solving a quadratic likelihood equation. Thus √ −B − B 2 − 4AC π̃1 = , (R.153) 2A and π̃t = ρ0 π̃c , (R.154) where A = ρ0 N , (R.155) B = −(ρ0 nt + y1t + nc + ρ0 y1c ) , (R.156) C = y1c + y1t . (R.157) The test statistic (R.151) is known as the score statistic. Under H0 this test statistic has mean 0 and variance 1. Let the data in Table R.1, denoted by x, be the 2 × 2 table actually observed. Then the observed value of the test statistic is D(x), and the left tail of the distribution of D(Y) at its observed value under H0 is X Pπc ,ρ0 (D(x)) = fπc ,ρ0 (Y) . (R.158) D(Y)≤D(x) If we knew the value of πc , then Pπc ,ρ0 (D(x)) would be the exact p-value for testing H0 versus H1 . Since πc is unknown, however, we take the supremum of (R.158) over 2584 R.5 Two Independent Binomials – R.5.6 Exact Noninferiority Test: Ratio <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 all values of πc in its range, just as we did for Barnard’s test in Section R.5.1. This produces a conservative p-value that is guaranteed to ensure that the true type-1 error of the test will never exceed its nominal significance level. Since ρ0 > 1 the range of possible values for πc is I(ρ0 ) = {πc : 0 < πc < min(1, 1/ρ0 )} . (R.159) Thereupon the unconditional exact one-sided p-value is p1 ≡ Pρ0 (D(x)) = sup{Pπc ,ρ0 (D(x)) : πc ∈ I(ρ0 )} . (R.160) Note that in practice the supremum in equation (R.160) is taken over a restricted range for π rather than over the entire range I(ρ0 ). This restriction, proposed by Berger and Boos (1994), adds stability and reduces the conservatism of the procedure. The p-values are suitably adjusted so that the restricted search for the supremum does not compromise the type-1 error. Finally, it is worth noting that when ρ0 = 1 the above p-value specializes to the left tail p-value obtained by Barnard’s test. Additional Remarks 1. The score statistic D(Y) specified by equation (R.151) is undefined when y1c = y1t = 0 and ρ0 = 1, or when y2c = y2t = 0 and ρ0 = 1. For these special cases the one-and two-sided p-values are both set to 1. These special cases never arise when performing a non-inferiority test with ρ0 6= 1. 2. The one-sided asymptotic p-value corresponding to p1 is obtained by assuming that the test statistic D(Y) converges in distribution to the standard normal. Thus, p̃1 = 1 − Φ(D(x)) . (R.161) 3. An alternative equivalent way to perform a level-α test of non-inferiority is to compute an exact 100 × (1 − α) lower confidence bound for ρ, say (ρL , ∞), using the method described in Section R.5.4. If ρL < ρ0 we reject the null hypothesis of inferiority. R.5.7 Unconditional Exact Confidence Interval for the Ratio of Proportions Suppose πc is the binomial response rate of Control and πt is the binomial response rate of treatment. We wish to compute an exact 100(1 − α)% confidence interval for ρ= πt . πc (R.162) The procedure parallels that described in Section R.5.4 for the difference of proportions, δ. We use a test based procedure. That is, we invert hypothesis tests of the R.5 Two Independent Binomials – R.5.7 CI for Binomial Ratio 2585 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis form ρ = ρ0 , where, in general, ρ0 6= 1. There is one further complication, however, since the p-values which we compute under these alternative hypotheses depend on a nuisance parameter. We handle this problem the same way we handled it for Barnard’s unconditional exact hypothesis test and for the various exact tests of non-inferiority; by taking a supremum over all possible values of the nuisance parameter. Choice of Test Statistic for Test Based Interval Estimation Suppose we take nc independent Bernoulli samples from Control and nt independent Bernoulli samples from Treatment. Let Y ∈ Ω (see Table R.3 denote any generic 2 × 2 table that might be observed, and let x (see Table R.1, page 2568), be the 2 × 2 table that was actually observed. Define π̂j = y1j nj (R.163) for j = c, t. In East we provide a test based exact confidence interval using the standardized statistic. D(Y) = q π̂t − ρ0 π̂c (π̃t )(1−π̃t ) nt + ρ2 0 (π̃c )(1−π̃c ) nc (R.164) where π̃c and π̃t are the maximum likelihood estimates of πc and πt computed, under the restriction that π̃t /π̃c = ρ0 . The use of (R.164) as the test statistic has been proposed by Farrington and Manning (1990) for asymptotic confidence intervals and by Chan and Zhang (1999) for exact confidence intervals. Test Based Exact Confidence Intervals: Inverting Two One-Sided Tests Let (ρ∗ , ρ∗ ) be the desired 100(1 − α)% exact confidence interval, evaluated at D(x), the observed value of the test statistic. This exact confidence interval may be constructed by inverting two one-sided hypothesis tests, each at the α/2 significance level, under appropriate alternative hypotheses about ρ. The probability of observing any Y ∈ Ω for any given value of ρ is nc nt fπc ,ρ (Y) = πcy1c (1 − πc )y2c (ρπc )y1t (1 − ρπc )y2t . (R.165) y1c y1t Define Pπc ,ρ (D(x)) = X fρπc (Y) (R.166) fρπc (Y) . (R.167) D(Y)≤D(x) and Qπc ,ρ (D(x)) = X D(Y)≥D(x) 2586 R.5 Two Independent Binomials – R.5.7 CI for Binomial Ratio <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 We must eliminate the nuisance parameter πc from equations (R.166) and (R.167) by taking the supremum over its range. It is easy to see that the permissible range for πc given ρ is the interval I(ρ) = {πc : 0 ≤ πc ≤ min(1/ρ, 1)} . (R.168) Pρ (D(x)) = sup{Pπc ,ρ (D(x)): πc ∈ I(ρ)} (R.169) Qρ (D(x)) = sup{Qπc ,ρ (D(x)): πc ∈ I(ρ)} . (R.170) Thus we define and Starting with ρ = 0, the desired lower confidence bound is obtained by increasing the value of ρ until we find a value, denoted by ρ∗ , such that the equality Qρ∗ (D(x)) = α/2 (R.171) is satisfied but for any ρ < ρ∗ , Qρ (D(x)) < α/2. The upper confidence bound, ρ∗ is obtained in an analogous fashion. Starting with ρ = ∞ (i.e., a very large positive number), the desired upper confidence bound is obtained by decreasing the value of ρ until we find a value, denoted by ρ∗ , such that the equality Pρ∗ (D(x)) = α/2 (R.172) is satisfied but for any ρ > ρ∗ , Pρ (D(x)) < α/2. East reports (ρ∗ , ρ∗ ) as the 100 × (1 − α)% confidence interval for the parameter ρ. Suppose ρ0 is the true (unknown) value of ρ. The long run relative frequency with which, in repeated trials, this interval excludes ρ0 is Pr(ρ0 ≤ ρ∗ ) + Pr(ρ0 ≥ ρ∗ ). Using arguments similar to those given on page 2579 for the binomial difference, δ0 , we can show that neither term in the above sum can exceed α/2. Therefore the probability of the confidence interval excluding ρ0 cannot exceed α. However, due to the discreteness of the distribution of D(Y), and the conservatism induced by taking a supremum over all πc ∈ I(ρ), the above exclusion probability is usually less than α instead of equaling α. Thus, (ρ∗ , ρ∗ ) may be regarded as a conservative confidence interval. In addition to the exact confidence interval, East also reports an exact one-sided p-value, defined as the smaller of the two tail areas, p1 = min(P0 (D(x)), Q0 (D(x)) , R.5 Two Independent Binomials – R.5.7 CI for Binomial Ratio (R.173) 2587 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis and the two sided exact p-value is twice the one-sided: p2 = 2p1 . (R.174) The two-sided p-value is weakly consistent with the corresponding exact confidence interval for ρ. That is, if 1 6∈ [ρ∗ , ρ∗ ] then p2 < α. The stronger consistency requirement, that p2 < α if and only if 1 ∈ / [ρ∗ , ρ∗ ] cannot be established unless Pρ (D(x)) and Qρ (D(x)) are monotone functions of ρ for any given D(x). This need not be the case, however. We shall see in Section R.5.8 that the above procedure can be slightly modified in practice. The suprema in equations (R.169) and (R.170) can be taken over a restricted range for π rather than over the entire range I(ρ). This restriction, proposed by Berger and Boos (1994), adds stability and reduces the conservatism of the procedure. The right hand sides of equations (R.171) and (R.172) are suitably adjusted so that the restricted search for the supremum does not compromise the coverage properties of the resulting confidence interval. Asymptotic Results East provides asymptotic confidence interval and p-values for ρ. They are due to Farrington and Manning (1990). The standardized test statistic (R.164) is adopted and assumed to have a standard normal distribution. Under the null hypothesis that ρ = 1 the standardized test statistic is identical to the statistic used for Barnard’s test. Therefore the asymptotic one-sided p-value is the same as the asymptotic one-sided p-value for Barnard’s test. The asymptotic two-sided p-value is double the asymptotic one-sided p-value. The asymptotic 100 × (1 − α)% confidence interval (ρ̃∗ , ρ̃∗ ) is obtained by inverting the corresponding one-sided hypothesis tests. Thus ρ̃∗ satisfies the equality (x /n ) − ρ (x /n ) 1t t ∗ 1c c 1−Φ q 2 = α/2 , (R.175) ρ∗ (π̃c )(1−π̃c ) (π̃t )(1−π̃t ) + nc nt where π̃c and π̃t are the maximum likelihood estimates of πc and πt , respectively, under the restriction that π̃t /π̃c = ρ∗ . Similarly ρ̃∗ satisfies the equality (x /n ) − ρ∗ (x /n ) 1t t 1c c Φ q 2 = α/2 , (R.176) ρ∗ (π̃c )(1−π̃c ) (π̃t )(1−π̃t ) + nc nt where π̃c and π̃t are the maximum likelihood estimates of πc and πt , respectively, under the restriction that π̃t /π̃c = δ ∗ . 2588 R.5 Two Independent Binomials – R.5.8 Restricted Nuisance Parameter Range <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 R.5.8 Searching for Nuisance Parameters in a Restricted Range: Berger-Boos Correction A source of conservatism, present in all the unconditional procedures covered in this chapter, is the fact that we must cater for the worst possible value of the nuisance parameter, πc , by taking a supremum over its range. If this source of conservatism could be reduced in some way, it would result in shorter confidence intervals. A modification based on a proposal by Berger and Boos (1994) achieves this end. It should be noted that Berger and Boos (1994) actually proposed their method only for hypothesis tests. To our knowledge the extension to confidence intervals is new. To avoid unnecessary repetition, we will discuss the Berger-Boos modification only as it applies to Section (R.5.4), for computing an unconditional exact confidence interval for the difference of two binomial parameters based on inverting two one-sided hypothesis tests. It will be clear from this discussion that the same type of Berger-Boos correction also applies to all the other settings in this chapter, such as exact unconditional tests of superiority, non-inferiority or equivalence, and exact confidence intervals for ratios of binomials. Let x be the observed 2 × 2 table. The main idea is that the information available in x about πc and πt can be used to reduce the conservatism of the exact confidence interval for δ. As a first step, we compute an exact 100(1 − γ/2)% confidence interval, A1 (x) = [l1 (x), u1 (x)], for πc , and, independently, an exact 100(1 − γ/2)% confidence interval, A2 (x) = [l2 (x), u2 (x)], for πt . Let E denote the event (πc , πt ) ∈ A1 (x) × A2 (x). If E is true that restricts the range of δ and the associated range of πc . It is easy to show that if E is true, the range of possible values for δ must be restricted to the interval [δmin , δmax ], where δmin = l2 (x) − u1 (x) , (R.177) δmax = u2 (x) − l1 (x) . (R.178) and For any δ in this interval, πc must lie in the restricted interval Ix (δ) = {πc : max(l1 (x), l2 (x) − δ) ≤ πc ≤ min(u1 (x), u2 (x) − δ)} . (R.179) Clearly Ix (δ) ⊆ I(δ). Define the right tail probability Qπc ,δ (D(x)) = X fπc ,δ (Y) (R.180) D(Y)≥D(x) R.5 Two Independent Binomials – R.5.8 Restricted Nuisance Parameter Range 2589 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis and its supremum Qδ|E (D(x)) = sup{Qπc ,δ (D(x)):πc ∈ Ix (δ)} . (R.181) Notice the difference between Qδ (D(x)) given by equation (R.122) and Qδ|E (D(x)) given by above equation. The first expression eliminates πc by searching over the unrestricted range I(δ) while the second expression eliminates πc by searching over the restricted range Ix (δ). The restricted search reduces conservatism since we must have Qδ|E (D(x)) ≤ Qδ (D(x)). (R.182) In a similar manner we define the left tail probability X Pπc ,δ (D(x)) = fπc ,δ (y) (R.183) D(y)≤D(x) and its supremum Pδ|E (D(x)) = sup{Pπc ,δ (D(x)): πc ∈ Ix (δ)} . (R.184) We next compute upper and lower confidence bounds for δ as described in Section R.5.4. Equations (R.123) and (R.124) must be modified, however, to compensate for the fact that we are now searching over a subset of the original parameter space. This adjustment is made by decreasing the right hand side of each equation by γ/2. Thus the lower confidence bound is the value of δ∗ that satisfies the condition Qδ∗ |E (D(x)) = α/2 − γ/2 , (R.185) such that for any δ satisfying δmin ≤ δ < δ∗ , Qδ|E (D(x)) < α/2 − γ/2. If no value of δ∗ can be found in the interval [δmin , δmax ] such that equation (R.185) is satisfied, we set δ∗ = δmin . The upper confidence bound is the value of δ ∗ that satisfies the condition Pδ∗ |E (D(x)) = α/2 − γ/2 , (R.186) such that for any δ satisfying δmax ≥ δ > δ ∗ , Pδ|E (D(x)) < α/2 − γ/2. If no value of δ ∗ can be found in the interval [δmin , δmax ] such that equation (R.186) is satisfied, we set δ ∗ = δmax . Thus, no matter what the data, we will always have (δ∗ , δ ∗ ) ⊆ (δmin , δmax ) . (R.187) Suppose that δ0 is the true (unknown) value of δ. With the above adjustment to the right hand sides of equations (R.185) and (R.186) one can show that Pr{δ0 ∈ / (δ∗ , δ ∗ )} ≤ α , 2590 R.5 Two Independent Binomials – R.5.8 Restricted Nuisance Parameter Range (R.188) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the desired exclusion probability. To see this observe that Pr{δ0 ∈ / (δ∗ , δ ∗ )} = Pr {[δ ∈ / (δ∗ , δ ∗ )] ∩ E} + Pr {[δ0 ∈ / (δ∗ , δ ∗ )] ∩ E c } ≤ Pr{δ0 ∈ / (δ∗ , δ ∗ )} + Pr(E c ) ∗ (R.189) c = Pr(δ∗ ≥ δ0 ) + Pr(δ ≤ δ0 ) + Pr(E ) ≤ Pr(δ∗ ≥ δ0 ) + Pr(δ ∗ ≤ δ0 ) + γ . (R.190) Inequality (R.189) uses the fact that a probability cannot exceed 1. Since, for i = 1, 2, the interval Ai (x) excludes the parameter πc with probability γ/2, it follows by the Bonferroni inequality that Pr(E c ) ≤ γ. Inequality (R.190) follows. We show next that neither Pr(δ∗ ≥ δ0 ) nor Pr(δ ∗ ≤ δ0 ) can exceed α/2 − γ/2. Define HE (δ0 ) to be the smallest value of the random variable D(Y) satisfying the inequality Qδ0 |E (HE (δ0 )) ≤ α/2 − γ/2 . (R.191) This definition implies that if D(x) < HE (δ0 ), then Qδ0 |E (D(x)) > α/2 − γ/2. But we know from (R.185) that there is no value of δ ≤ δ∗ for which Qδ (D(x)) > α/2 − γ/2. It follows that δ0 > δ∗ whenever D(x) < HE (δ0 ). Therefore the random event {D(y) < HE (δ0 )} is a proper subset of the random event {δ0 > δ∗ } and hence Pr(δ0 > δ∗ ) ≥ Pr{D(y) < HE (δ0 )} . (R.192) Taking complimentary probabilities on both sides of equation (R.192) we have Pr(δ∗ ≥ δ0 ) ≤ Pr{D(y) ≥ HE (δ0 )} = Qπc ,δ0 (HE (δ0 )) . (R.193) Next, taking the supremum over all possible values of πc ∈ Ix (δ0 ), we have Qπc ,δ0 (HE (δ0 )) ≤ sup{Qπc ,δ0 (HE (δ0 )): πc ∈ Ix (δ0 )} = Qδ0 |E (HE (δ0 )) ≤ α/2−γ/2 . (R.194) This establishes that Pr(δ∗ ≥ δ0 ) ≤ α/2 − γ/2 . (R.195) By a similar argument, Pr(δ ∗ ≤ δ0 ) ≤ α/2 − γ/2. Therefore Pr{δ0 ∈ / (δ∗ , δ ∗ )} ≤ 2(α/2 − γ/2) + γ = α (R.196) The above modifications generally give shorter confidence intervals than the unmodified approach wherein we search the entire sample space of the nuisance parameters. One ambiguity about the procedure, however, is the choice of γ. The R.5 Two Independent Binomials – R.5.8 Restricted Nuisance Parameter Range 2591 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis smaller we make γ the more the modified method resembles the original method. The choice γ = 0 corresponds to making no modification to the original approach. At the other extreme the larger we make γ, the more we restrict the region in which we search for the supremum, and the more we must compensate for this restriction on the right hand sides of equations (R.185), (R.186). These equations show that we cannot increase γ beyond α/2. The Implementation in East In East we have set the default value to γ = 0.99e − 7. This value is very small relative to α, which is usually 0.05. Therefore it does not usually affect the right hand sides of equations (R.185) and (R.186) by much. On the other hand it can provide greater stability, narrower confidence intervals, and faster execution times, in unbalanced settings, by cutting off regions near the extremes, 0 and 1, of the parameter space. We have observed, empirically, that the functions Pδ,πc and Qδ,πc can have multiple high peaks at values of πc near 0 or 1. By cutting off these regions from the parameter space we are able to reduce conservatism and add stability to the computation of the supremum. R.5.9 Noninferiority:Odds Ratio of Proportions πt (1 − πc ) . In πc (1 − πt ) Noninferiority trial, we are interested in testing H0 : Ψ ≥ Ψ0 against H0 : Ψ < Ψ0 if Ψ0 > 1 Or, H0 : Ψ ≤ Ψ0 against H0 : Ψ > Ψ0 if Ψ0 < 1 The test statistics for the two tests are : The odds ratio of proportions denoted by Ψ is defined as Ψ = Noninferiority (Wald) Z=r ln Ψ̂ − ln Ψ0 πˆt πˆc + nt (1 − πˆt ) nc (1 − πˆc ) ∼ N (0, 1) (R.197) Noninferiority (Score) Z= nc (πˆc − π˜c ) SE ∼ N (0, 1) where 1 1 + SE = nt πt (1 − πt ) nc πc (1 − πc ) 2592 (R.198) −1 (R.199) R.5 Two Independent Binomials – R.5.10 Common Odds Ratio for Stratified 2x2 Tables <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 R.5.10 Common Odds Ratio for Stratified 2x2 Tables Breslow-Day Test for Homogeneity of Odds-Ratios H0 : Ψi = Ψ, i = 1, 2, . . . s . Breslow and Day (1980) statistic: χ2BD = s X [Xi − Ai (Ψ̂)]2 var(Xi | Ψ̂) i=1 , (R.200) where Ai (Ψ̂) is the positive root of the quadratic equation Ai (Ni − mi − ni + Ai ) = Ψ̂ , (mi − Ai )(ni − Ai ) (R.201) formed by expressing the ith table as mi − Ai Ni − m i − n i + Ai Ai n i − Ai , and equating its empirical Odds-Ratio to the Mantel-Haenszel common Odds-Ratio s P Ψ̂ = xi (Ni − mi − ni + xi )/Ni i=1 s P . (R.202) (ni − xi )(mi − xi )/Ni i=1 The variance of Xi is estimated by: var(Xi | Ψ̂) = [ 1 Ai (Ψ̂) + 1 mi − Ai (Ψ̂) + 1 ni − Ai (Ψ̂) + 1 ]−1 Ni − mi − ni + Ai (Ψ̂) (R.203) Tarone correction for Breslow-Day test χ2BDT = s X i=1 [Xi − Ai (Ψ̂)]2 var(Xi | Ψ̂) − s P Xi − i=1 s P s P 2 Ai (Ψ̂) i=1 , (R.204) var(Xi | Ψ̂) i=1 R.5 Two Independent Binomials – R.5.10 Common Odds Ratio for Stratified 2x2 Tables2593 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis where Ai , and var(Xi | Ψ̂) are defined as above. In large samples, both χ2BD and χ2BDT are chi-squared distributed with s − 1 degrees of freedom, and the 2-sided p-values for testing H0 is: pBD = Pr(χ2BD ≥ χ20,BD ) (R.205) pBDT = Pr(χ2BDT ≥ χ20,BDT ) , χ20,BD χ2BD χ20,BDT and are the observed values of and where Mantel-Haenszel Inference for the Common Odds-Ratio (R.206) χ2BDT . H0 : Ψ = 1 . Mantel-Haenszel (1959) test s s X xi yi0 − x0i yi 2 X mi m0i ni (Ni − ni ) χ2M H = [ ] / Ni (Ni − 1)Ni2 i=1 i=1 (R.207) is chi-squared distributed with one degree of freedom. pM H = Pr(χ2M H ≥ χ20 ) (R.208) where χ20 is the observed value of χ2M H . The RBG variance is var(log Ψ̂) = s X ai ci ai di + bi ci bi di ( 2 + + 2) 2c 2c d 2d + + + + i=1 (R.209) where ai = bi = ci = di = c+ = xi + yi0 , Ni x0i + yi , Ni xi yi0 , Ni x0i yi , Ni s X ci , i=1 d+ = s X di . i=1 2594 R.5 Two Independent Binomials – R.5.10 Common Odds Ratio for Stratified 2x2 Tables <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 A 100(1 − α)% confidence interval for log Ψ CIRBG = log Ψ̂ ± zα/2 [var(log Ψ̂)]1/2 . (R.210) The 2-sided p-value for testing H0 : Ψ = 1 , based on the RBG variance is pRBG = 2[1 − Φ( q R.5.11 | log Ψ̂| )] . (R.211) var(log Ψ̂) Fisher’s Exact Test As in the Difference of Proportions test, suppose πt and πc denote the proportions of the successes from the experimental treatment (T) and the control treatment (C). To test the null hypothesis: H0 : πt = πc , (R.212) against 1-sided alternatives of the form, H1 : πt > πc , (R.213) H10 : πt < πc , (R.214) or and against 2-sided alternatives of the form H2 : πt 6= πc . (R.215) Suppose that H0 is true and let the common probability of success for the two binomial populations be πt = πc = π. Then the probability of observing the data in Table R.1 is a product of two binomial probabilities, denoted by nc nt f0 (x) = π x1c +x1t (1 − π)x2c +x2t . (R.216) x1c x1t The p-value is defined to be the probability, under H0 , of obtaining a 2 × 2 table at least as extreme as the observed table, x. Let Y denote any generic 2 × 2 table that can arise if we take two independent samples, one of size nc from binomial population c and the other of size nt from binomial population t. Such a generic 2 × 2 table is R.5 Two Independent Binomials – R.5.11 Fisher’s Exact Test 2595 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis Table R.3: Any Generic 2x2 Contingency Table, y Response Success Failure Col Total Control y1c y2c nc Treatment y1t y2t nt Row Total y1c + y1t y2c + y2t N displayed below in Table R.3: The probability of observing this table is f0 (Y) which, as shown by equation (R.216), contains an unknown (nuisance) parameter, π. As long as the probability of observing any generic 2 × 2 table depends on π, exact inference is not possible. Since the p-value is based on summing up the probabilities of many such tables, each depending on an unknown parameter. The key to exact inference is getting rid of π, the nuisance parameter. In Fisher’s Exact Test, conditional approach is used. The sufficient statistic for π is y11 + y12 , the sum of successes from the two populations. The observed value of the sufficient statistic is m1 . Thus, by the sufficiency principle, if the condition on y11 + y12 = m1 , the probability of any generic 2 × 2 table, Y, no longer depends on the nuisance parameter π. To see this let Γ = {Y: 2 X yij = mi , 2 X yij = nj } (R.217) i=1 j=1 denote a reference set of 2 × 2 contingency tables with fixed row and column margins. Since we are dealing with the case of two independent binomial samples, each of size ni , i = 1, 2, (considering n1 = nc and nt = n2 ) , conditioning on y11 + y12 = m1 is equivalent to conditioning on Y ∈ Γ. Let h(Y) denote the probability of observing any Y ∈ Γ under the null hypothesis (R.212). Then f0 (Y) Y∈Γ f0 (Y) h(Y) = P (R.218) which simplifies to h(Y) = n1 y11 N m1 n2 y12 , (R.219) a hypergeometric probability free of π. Exact inference is thus possible only if we confine our attention to 2 × 2 tables in Γ. Next turn to the question of how to 2596 R.5 Two Independent Binomials – R.5.11 Fisher’s Exact Test <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 determine if a 2 × 2 contingency table, Y, is at least as extreme as the observed table, x. Let D : Γ → R be a function assigning a real number, D(Y), to each Y ∈ Γ in such a way that Y is judged to be at least as extreme as x provided D(Y) ≥ D(x). We refer to D(Y) as a discrepancy measure. Fisher’s test statistic is given by: D(Y) = −2 log(γh(Y)) . where (R.220) 1 γ = (2πN −3 m1 m2 n1 n2 )− 2 (R.221) The exact 2-sided p-value is defined as: X p2 = Pr(D(Y) ≥ D(x)) = h(Y) , (R.222) D(Y)≥D(x) the sum being taken over all Y ∈ Γ such that D(Y) ≥ D(x). In large samples the distribution of D(Y) conditional on Y ∈ Γ converges to the chi-square distribution with 1 degree of freedom. (Kendall and Stuart (1979)). The asymptotic 2-sided p-value is given by: p̃2 = Pr(χ21 ≥ D(x)) , (R.223) where χ21 is a random variable distributed as chi-square with 1 df. You can also define the 1-sided exact p-value. It is based on the test statistic: D(Y) = y11 . (R.224) Since we have confined our attention only to 2 × 2 contingency tables in Γ, the value of y11 suffices to specify the entire 2 × 2 table Y, and the exact probability of y11 is h(Y). Moreover it is easy to see that y11 ranges from a minimum of tmin = max(0, n1 − m2 ) , (R.225) tmax = min(m1 , n1 ) . (R.226) to a maximum of The exact 1-sided p-value for the Fisher tests is then defined as either the right or left tail area of the exact distribution of y11 at the observed value, x11 , based on the location of x11 relative to n1 m1 /N , the mean of y11 . That is, ( P x11 h(Y) if x11 > n1 m1 /N 11 =tmin Ptymax p1 = (R.227) if x11 ≤ n1 m1 /N y11 =x11 h(Y) A small 1-sided p-value furnishes evidence against the 1-sided alternative (R.213) if it is computed as the right tail of the exact distribution of y11 , and against the 1-sided alternative (R.214), if it is computed as the left tail of the exact distribution of y11 . R.5 Two Independent Binomials – R.5.11 Fisher’s Exact Test 2597 <<< Contents R R.6 * Index >>> Technical Reference and Formulas: Analysis Many Proportions R.6.1 Contingency Coefficients R.6.2 Wilcoxon Rank Sum Test for Ordered Categories Data R.6.3 Trend in R ordered proportions R.6.4 Chi-square for R Unordered Binomial Properties R.6.5 Chi-square for R Unordered multinomial Properties R.6.1 Contingency Coefficients The Contingency Coefficients are derived from Pearson’s chi-square statistic The Phi contingency coefficient is given by, r χ2 (x) . (R.228) φ= N Pearson’s contingency coefficient is given by: s χ2 (x) . CC = χ2 (x) + N (R.229) The Sakoda contingency coefficient is given by s qχ2 (x) CC1 = . (q − 1)(χ2 (x) + N) (R.230) The Tschuprov contingency coefficient ranges between 0 and 1, with 0 signifying no association and 1 signifying perfect association. It is given by, CC2 = ( χ2 (x) N p (r − 1)(c − 1) )1/2 . (R.231) Finally, Cramer’s V coefficient ranges between 0 and 1, with 0 signifying no association and 1 signifying perfect association. It is given by, s χ2 (x) . (R.232) V = N (q − 1) The 100 × (1 − α)% confidence interval for any measure of association CI = M (x) ± zα/2 × ASE MLE , (R.233) where zβ is the value of the (1 − β) percentile point of the standard normal distribution. R.6.2 2598 Wilcoxon Rank Sum Test for Ordered Categories Data R.6 Many Proportions – R.6.2 Wilcoxon Rank Sum Test for Ordered Categories Data <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Each response must fall into one of c ordinal categories according to a multinomial distribution. j X πik , γjk = i=1 0 γjk = j X 0 πik , i=1 for j = 1, 2, . . . c, and k = 1, 2, . . . s. Then the Wilcoxon test is especially suited to detecting departures from the null of the form 0 H1 : γjk ≥ γjk , or 0 H10 : γjk ≥ γjk , for all j ∈ {1, 2, . . . c}, k ∈ {1, 2, . . . s}, with strict inequality at at-lEast 6.3 one j, k. The 2-sided alternative hypothesis is that either H1 or H10 is true; the alternative hypothesis does not specify which of the two possibilities is true, however. The Wilcoxon Rank Sum Test Statistic is of the form, T = c s X X wj Xjk , (R.234) k=1 j=1 where wj are Wilcoxon-Mann-Whitney scores which are the ranks (mid-ranks in the case of tied observations) of the underlying responses. wj = n1 + · · · + nj−1 + (nj + 1)/2 (R.235) The mean of T , under the null hypothesis of no row and column interaction is given by E(T ) = c mX N wj nj (R.236) j=1 And the variance is X 2 c mm0 E(T ) σ (T ) = wj − nj N (N − 1) j=1 m 2 (R.237) Under H0 , T follows asymptotic Normal distribution with mean E(T ) and variance σ 2 (T ) R.6 Many Proportions – R.6.3 Trend in R ordered proportions 2599 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis R.6.3 Trend in R ordered proportions To determine whether a trend exists in the unknown proportions of response πg for g = 1, . . . , K ordered binomially distributed populations using independent random samples. Test statistic: T = c X wj Yj , (R.238) j=1 where wj = j − 1 . (R.239) The mean of the test statistic is E(T ) = c mX N wj nj . (R.240) j=1 and the variance of the test statistic is c 2 E(T ) m (N − m) X 2 wj − nj . σ (T ) = N (N − 1) j=1 m (R.241) T −E(T ) Under H0 , Z = √ follows N (0, 1) distribution. V ar(T ) R.6.4 Chi-square for R Unordered Binomial Properties Hypothesis H0 : π1j = π2j . . . = πRj for all j = 1, 2 Vs H1 : at least one πij differs for i = 1, 2, . . . , R and j = 1, 2. Let the R x 2 contingency Table R.4: displayed in Table R.4 be the one actually observed. Test Statistic χ2R−1 = R.6.5 R X 2 X (xij − mi nj /N )2 mi nj /N i=1 j=1 Chi-square for R Unordered multinomial Properties Hypothesis H0 : π12 = π2j . . . = πRj for all j = 1, 2, . . . , C 2600 R.6 Many Proportions – R.6.5 Chi-square for R Unordered multinomial Properties <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table R.4: The Observed Rx2 Contingency Table Rows Row 1 Row 2 .. . Failure x11 x21 .. . Success x12 x22 .. . Row Total m1 m2 .. . Row R Col. Total xR1 n1 xR2 n2 mR N Vs H1 : at least one πij differs. Let the R × C contingency Table R.5 displayed in Table R.4 be the one actually observed. Table R.5: The Observed RxC Contingency Table Rows Row 1 Row 2 .. . Col.1 x11 x21 .. . Col.2 x12 x22 .. . Col.3 ... ... Col. C x1C x2C .. . Row Total m1 m2 .. . Row R Col. Total xR1 n1 xR2 n2 ... ... xRC nC mR N Test Statistic χ2R−1 = R.7 R X 2 X (xij − mi nj /N )2 /N ] m i nj i=1 j=1 Agreement R.7.1 Cohen’s Kappa R.7.1 Cohen’s Kappa Hypothesis R.7 Agreement – R.7.1 Cohen’s Kappa 2601 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis H0 : Agreement between two refers is purely from random variation Vs H1 : Agreement between two refers is not purely from random variation (for two sided test) Either H1 : Agreement between two refers is greater than that is expected from radom variation only. Or H1 : Agreement between two refers is less than that is expected from random variation only. (For 1-sided test) Test Statistic For Kappa r r P P n xii − m i ni i=1 K = i=1 ∼ N (0, 1). r P n2 − mi ni i=1 For Weighted Kappa r P r P Kw = wij xij − i=1 j=1 r P n2 − r P r P wij mi nj i=1 j=1 r P wij mi nj ∼ N (0, 1) i=1 j=1 R.8 Survival : Two Samples Let ti , I = 1, 2, 3, · · · , M be the distinct time points of event on any Arm di,t = Number of events on treatment arm at time ti ni,t = Number of subjects at risk on treatment arm just before time ti di,c = Number of events on control arm at time ti ni,c = Number of subjects at risk on control arm just before time ti 2602 R.8 Survival : Two Samples <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 di = di,t + di,c ni = ni,t + ni,c Assumption : Censored observations are considered in the risk set if they are tied with time point at which event of treatment is observed. For Superiority N umi = di,t − ni,t Deni = di ni ni,t ni,c (ni − di )di n2i (ni − 1) For Non-infoeriority δ0 = Non-Inferiority margin n∗i = ni,t + n∗i,c e−δ0 N umi = di,t − ni,t (nd∗i ) i Deni = ni,t ni,c di e−δ0 ∗ n∗ i ni Weighted Test Statistic is defined as (for both superiority and non-inferiority) N um = M X Wi N umi (R.242) Wi2 Deni (R.243) i=1 Den = M X i=1 N um TS = √ Den where weights are defined as follows. R.8.1 Logrank Test Wi = 1 For all i. R.8 Survival : Two Samples – R.8.2 Wilcoxon-Gehan 2603 <<< Contents R * Index >>> Technical Reference and Formulas: Analysis R.8.2 Wilcoxon-Gehan Wi = n i R.8.3 Harrington-Fleming p q Wi = Ŝi−1 1 − Ŝi−1 Where Wi = For i > 1 1 if q=0 if q>0 0 p q Wi = Ŝi−1 1 − Ŝi−1 With Y nj − dj Ŝt = nj tj ≤t Test Statistic for Stratified Simulations Let S = Number of Strata N umj = Numerator for j th stratum using (R.242) Denj = Denominator for j th stratum using ( R.243) Test Statistic is given by S P N umj j=1 TS = s S P j=1 2604 R.8 Survival : Two Samples (R.244) Denj <<< Contents * Index >>> S Theory - Design - Binomial One-Sample Exact Test This appendix lays out the theory behind East’s power and sample size computations in the case of the exact fixed sample test and the exact group sequential test of a proportion π being equal to a constant π0 . Both Schultz et al. (1973) and Fleming (1982) have proposed multi-stage procedures for rejecting the null hypothesis under strict assumptions about the type 1 and type 2 errors. The methods used in East and described below are based on Jennison and Turnbull (2000). Section (S.1) explains how to calculate the power and the sample size of the exact fixed sample test. Section (S.2) continues by considering the power and sample size of the group sequential test. It also explains how the boundaries of this test are computed. S.1 Power and Sample Size for the Exact Fixed Sample Test S.1.1 Power S.1.2 Sample Size Consider a clinical trial of fixed sample size N . The goal is to test - based on the observed number of successes S = s - whether the binary probability π of response is equal to some a priori hypothesized value π0 . The null hypothesis of interest is H0 : π = π0 East computes the power of the exact test of H0 against one of the following one-sided alternatives H1 : π = π1 π1 > π0 or H1 : π = π1 π1 < π0 In what follows, we will assume that interest resides in detecting the former alternative where π1 > π0 . The exact test is based on the binomial probability distribution of the response variable S. Recall that for a binomial distribution Bin(N, π) the probability that S = s is given by N s (N −s) Pr (S = s|π) = π (1 − π) s and that the tail end probability is the cumulative sum of probabilities N X N i (N −i) Pr (S ≥ s|π) = π (1 − π) . i i=s S.1 Power and Sample Size 2605 <<< Contents S * Index >>> Theory - Design - Binomial One-Sample Exact Test Suppose data from the trial provide an observed number of responses S = s. Then a test of the null hypothesis H0 consists in calculating the probability under the null distribution Bin(N, π0 ) of observing s or more responses among N subjects, and then comparing this probability to the type 1 error rate α. If Pr (S ≥ s|π0 ) ≤ α then the null hypothesis that π = π0 can be rejected in favor of the alternative hypothesis that π = π1 . S.1.1 Power of the Exact Fixed Sample Test Since the power and type 1 error of a design are intimately related and because in an exact test the desired false positive rate is often not attainable, we first consider calculation of the design’s type 1 error before moving on to the design’s power. Suppose a type 1 error probability of α has been specified for the study design. Due to the discreteness of the binomial distribution, this false positive rate α will more likely than not be unattainable. Instead the design will attain a type 1 error of α∗ ≤ α. Under the null hypothesis H0 , the number of responses S follows a Bin(N, π0 ). Define s0 to be the smallest integer, such that Pr (S ≥ s0 |π0 ) ≤ α Then the attained significance level α∗ is given by α∗ = Pr (S ≥ s0 |π0 ) Upon knowing the critical value s0 that gives us type 1 error α∗ under the null hypothesis distribution Bin(N, π0 ), we can calculate the exact power of the design by considering the probability distribution under the alternative hypothesis Bin(N, π1 ). The exact power of the procedure is given by (1 − β) = Pr (S ≥ s0 |π1 ) S.1.2 Sample Size Calculation for the Exact Fixed Sample Design Calculation of a sample size N for a pre-specified type 1 error α and power (1 − β) is 2606 S.1 Power and Sample Size – S.1.2 Sample Size <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 complicated by the fact that neither α nor β are attainable given the discreteness of the binomial distribution. It is well known that a plot of the power versus sample size of a design displays a saw-tooth behavior, but this zig-zag is mostly due to the concurrently varying type 1 error of the design. Due to this behavior though, multiple sample sizes may be provided in answer to the design problem. Really, however, the optimal sample size will depend on the priorities given to type 1 and 2 errors by the investigator. The sample size N is calculated such that both attained type 1 and 2 errors α∗ and β ∗ are controlled. A search of the parameter space of N must be performed to find those values satisfying both of the following equations α∗ = Pr (S ≥ s0 |N, π0 ) ≤ α and β ∗ = Pr (S ≤ s0 |N, π1 ) ≤ β while either (1) primarily maximizing the attained type 1 error while maintaining it below α, (2) primarily maximizing the attained type 2 error while maintaining it below β, or (3) optimizing to get α∗ and β ∗ as close to α and β, respectively as possible. The most practical choice of sample size, however, may be that sample size above which power is guaranteed to be at least (1 − β). S.2 Power and Sample Size for the Exact Group Sequential Test Instead of a fixed sample test of the null hypothesis, let us now consider a group sequential test. This procedure tests the null hypothesis not just once at the end of the trial, but a total of K times after each group of nk k = 1, . . . , K subjects’ data have been observed. In what follows, we still consider a 1-sided test of the null hypothesis with the alternative hypothesis specified in the direction of π1 > π0 . Suppose an error-spending function has been pre-specified to control the type 1 error of the group sequential test. Let {α1 , . . . , αK } be the fractions of the type 1 error at each stage, such that they sum up to α. The efficacy boundary corresponding to this error-spending function is given by the set of critical values {c1 , . . . , cK }. Before considering the construction of the boundary itself and the calculation of the test’s power, the probability distribution of the number of responses at stage k must be established. S.2 Power and Sample Size 2607 <<< Contents S * Index >>> Theory - Design - Binomial One-Sample Exact Test Define Ck (s; π, Nk ) to be the probability of observing s responses at stage k where 1 ≤ k ≤ K. Here Nk refers to the cumulative sample size up to and including stage k so that nk = Nk − Nk−1 is the sample size for stage k only. Then for the first stage, the probability distribution of response is binomial with N1 s (N −s) C1 (s; π, N1 ) = π (1 − π) 1 . s Thereafter, the probability of s responses at stage k > 1 depends on how many of those responses have been observed up to but excluding stage k. This distribution is given by bk (s) Ck (s; π, Nk ) = X Ck−1 (i; π, Nk−1 ) ∗ Bk ((s − i) ; π, nk ) i=ak (s) where Bk (s; π, nk ) = Nk s (N −s) π (1 − π) k s and ak (s) = max (0, (s − nk )) bk (s) = min s, c(k−1) − 1 S.2.1 Computing Boundaries for the Exact Group Sequential Test Given an a priori specified α-spending function, the type 1 error fractions used at each stage are provided as α1 , . . . , αK . However, due to the discreteness of the binomial distribution, those values are not achievable at each step. It is reasonable however to carry-over the unspent type 1 error at any stage to the subsequent stage. This informs the following calculations and adjustments to the boundary. At the first interim look k = 1, operating under the null hypothesis, the boundary value is calculated simply by finding the smallest integer c1 such that N1 X C1 (i; π0 , N1 ) ≤ α1 i=c1 However, the actual tail end probability defined by the cut-off value c1 is α1∗ = N1 X C1 (i; π0 , N1 ) i=c1 2608 S.2 Power and Sample Size – S.2.1 Boundaries <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The unused type 1 error θ1 = α1 − α1∗ can be carried over to be spent at stage 2. More generally, define θ0 = 0 then at stage k, the available type 1 error is αk + θk−1 where θk−1 = = ∗ (αk−1 + θk−2 ) − αk−1 k−1 X (αi − αi∗ ) i=1 The boundary value is then calculated by finding the smallest integer ck for which Nk X Ck (i; π0 , Nk ) ≤ αk + θk−1 i=ck Repeating this process until the ultimate look K enables the full construction of the efficacy boundary. Note that at the last look, the cumulative and thus overall attained type 1 error of the design will be α∗ i ≤ α where α∗ = K X αi∗ i=1 = α − θK . S.2.2 Power of the Exact Group Sequential Test Just as in the case of the exact fixed sample test, the power and type 1 error of the group sequential test are intimately tied. The previous section provided calculation of the boundary c1 , . . . , cK under the assumption that the null hypothesis was true. These defined an attained overall type 1 error α∗ ≤ α of the group sequential design. Considering the crossing probability of that same boundary under the alternative hypothesis provides the power of the group sequential test. That is (1 − β) = Nk K X X Ck (i; π1 , Nk ) k=1 i=ck S.2 Power and Sample Size – S.2.3 Sample Size 2609 <<< Contents S * Index >>> Theory - Design - Binomial One-Sample Exact Test S.2.3 Sample Size Calculation for the Exact Group Sequential Design As in the case of the exact fixed sample design, calculation of the maximum sample size Nmax = NK for a pre-specified type 1 error α and power (1 − β) is complicated by the fact that neither α nor β are attainable given the discreteness of the binomial distribution. As a result, the choice of Nmax is not unique. Rather the decision of which sample size to choose for a particular trial will depend on the priorities given to type 1 and 2 errors by the investigator. The sample size Nmax is calculated such that both attained type 1 and 2 errors α∗ and β ∗ are controlled. A search of the parameter space of Nmax = NK must be performed to find those values satisfying both of the following equations α∗ = Nk K X X Ck (i; π0 , Nk ) ≤ α k=1 i=ck and β∗ = K (cX k −1) X k=1 Ck (i; π1 , Nk ) ≤ β i=0 while either (1) primarily maximizing the attained type 1 error while maintaining it below α, (2) primarily maximizing the attained type 2 error while maintaining it below β, or (3) optimizing to get α∗ and β ∗ as close to α and β, respectively as possible. The most practical choice of sample size, however, may be that sample size above which power is guaranteed to be at least (1 − β). 2610 S.2 Power and Sample Size <<< Contents * Index >>> T Theory - Design - Binomial Paired-Sample Exact Test This appendix presents the theory behind the computations of power and sample size for the conditional exact McNemar’s test for the difference of proportions arising from paired binomial populations. East implements the methodology and numerical algorithms for the conditional version of McNemar’s test, published by Duffy (1984) and Agresti (2002). Methods and algorithms for the unconditional test, used in previous versions of East, have been published by Suissa and Shuster (1991). Exact conditional methods are considerably faster to execute than the exact unconditional methods. In the paired binomial case, the conditional approach simplifies to a single binomial model, allowing the computation of exact p-values and confidence intervals for arbitrarily large data sets with little difficulty. This is not the case for unconditional methods, where fairly long computing times are to be expected for larger sample sizes. In addition, the theory of exact unconditional inference is more complex and historically has not possessed as extensive a bibliography as the theory of exact conditional inference. Section (T.1) presents how to calculate the power and the sample size for the exact fixed sample conditional McNemar’s test. T.1 Power and Sample Size for the Exact Conditional Fixed Sample Test: McNemar’s Test Consider a trial in which the investigator’s interest is in testing for a difference in success rates between paired binary responses. Such a test is typically used in a repeated measures setting, for example when each subject’s response is recorded both before and after treatment. The test then determines if the pre and post treatment response rates are equivalent. Another example would be a study involving matched pairs, such as siblings, where each member of the pair is measured for an outcome of interest and tests for the same probability of response. Here, the inference is complicated by the fact that the observations are correlated, even though there is independence across the different pairs being studied. Suppose that two binomial responses are observed on either N individuals (pre and post event), or N matched pairs. Let y11 be the count of the number of individuals whose first and second responses are both positive, or in the case of matched pairs where both responses are positive. In a similar manner let y22 be the count where both first and second responses are negative. Let y12 be the count of pairs where the first response is positive and second response is negative and let y21 be the count where the first response is negative and second response is positive. McNemar’s test is based on T.1 Power for McNemar’s Test 2611 <<< Contents T * Index >>> Theory - Design - Binomial Paired-Sample Exact Test the 2 × 2 table of the form y= y11 y21 y12 y22 (T.1) . Again, interest is in the equality of binary response rates from two populations, where the data consist of paired, dependent responses. The tests described here determine if the initial response rate is statistically equivalent to the final response rate. Let (π11 , π12 , π21 , π22 ), denote the four cell probabilities for table (T.1). Let π1 be the probability that the first response is positive and π2 be the probability that the second response is positive. Marginal probabilities can be expressed as π1 = π11 + π12 , and π2 = π11 + π21 . (T.2) Therefore the null hypothesis can be expressed as H0 : π1 = π2 , (T.3) H1 : π1 6= π2 , (T.4) versus the alternative Using (T.2), π1 = π2 implies that π12 = π21 . The inference becomes focused on the probabilities of discordant pairs, and subsequent test statistics are all functions of the difference y21 − y12 . East calculates the power for the exact conditional test of the null hypothesis: H0 : π12 = π21 . against the specific alternative H1 : π21 − π12 = ∆ . In both cases, the user inputs the probability of a discordant pair, which is Ψ = π12 + π21 , and the difference of interest, ∆. From this information, East determines the original cell probabilities π12 = 2612 T.1 Power for McNemar’s Test Ψ−∆ , 2 π12 = Ψ − π12 . <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Exact Conditional Test - Power The unconditional power for the exact conditional test uses the fact that, conditional on the number of discordant pairs Nd = y12 + y21 , Y12 has a binomial distribution with number of trials Nd and success probability π12 /(π12 + π21 ). Thus, one can use the power calculation for a single binomial proportion to obtain the exact conditional power for McNemar’s test. Let yα be the cut-off value for rejecting the null hypothesis with a level-α one-sided exact McNemar test conditional on Nd . Thus, yα is the smallest integer such that, under the null hypothesis, Pr(Y12 ≥ yα |Nd , H0 ) ≤ α , where Pr(Y12 = y|Nd , H0 ) = (0.5)Nd Nd ! . y12 !(Nd − y12 )! (T.5) (T.6) The conditional power of the one-sided exact conditional McNemar test is thus Pr(Y12 ≥ yα |Nd , H1 ) = X Nd π12 y π21 Nd −y . y π12 + π21 π12 + π21 (T.7) y≥yα Exact Conditional Test - Sample Size The exact conditional sample size for fixed parameter and power values are obtained by evaluating the exact conditional power functions over a range of sample sizes until the resulting N is found that obtains the desired power. Since neither α nor β are guaranteed to be attainable due to discreteness of the binomial distribution, the solution to this parameter space search N is not unique. The choice of sample size for a particular trial should depend on the priorities given to type 1 and 2 errors by the investigator. Possible prioritization may involve: Primarily maximizing the attained type 1 error while maintaining it below α Primarily maximizing the attained type 2 error while maintaining it below β Optimizing to get α∗ and β ∗ as close to α and β as possible. The most practical choice of sample size, however, may be that sample size above which power is guaranteed to be at least (1 − β). T.1 Power for McNemar’s Test 2613 <<< Contents * Index >>> U Theory - Design - Simon’s Two-Stage Design In this appendix, we describe the theory behind the two-stage optimal design for phase 2 clinical trials developed by Simon (1989). This design is optimal in the sense that it minimizes the maximum expected sample size under the null hypothesis. It was developed for oncology trials to ensure that patients do not receive a treatment that is clearly inferior to other available options. East also supports Simon’s minimax approach as well as an admissible two-stage design, which is a graphical method used to search for an alternative with more favorable features (Jung, et al. 2004). Simon’s Optimal design Of primary interest is testing the null hypothesis H0 : π ≤ π0 that the true response probability is less than some uninteresting level π0 . If the null hypothesis is indeed true, then the probability of a false positive should be controlled at level α. This means that the decision to carry the drug into later phases of clinical development should be less than α. Suppose an alternative hypothesis H1 : π ≥ π1 is also specified, which claims that the true response probability is at least some desirable target level π1 . If this hypothesis is true, then the probability of a false negative should be controlled to be less than a pre-specified value β. Finally, in addition to these two constraints, the design should be optimal in the sense that it minimizes the number of patients treated with a drug of low activity. Define n1 and n2 to be the number of patients studied in the first and second stage of the trial, respectively. The expected sample size n can be computed as E[n] = n1 + (1 − P ET )n2 where P ET = s1 X Bin(i; π, n1 ) i=0 Here, PET represents the probability of early termination after the first stage, a decision based on the number of responses observed for the n1 patients in that stage of the trial. Terminating the experiment at the end of the first stage for futility is based on the herein implicit rule that the treatment is dropped if s1 or fewer responses are observed. At the end of the second stage, the treatment is considered ineffective if a total of s responses are observed in all n = n1 + n2 patients of the trial. Thus, the probability of 2614 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 concluding the treatment is ineffective is given by s1 X i=0 min[n1 ,s] Bin(i; π, n1 ) + X j=(s1 +1) (s−j) Bin(j; π, n1 ) X Bin(k; π, n2 ). k=0 To optimally design the trial given parameters π0 , π1 , α, and β, this probability statement must be evaluated under the null hypothesis that π = π0 over all values of n1 and n2 as well as s1 and s. Note that early termination of the trial for efficacy is not permitted in this design. If it were, it would be possible to further reduce the expected sample size of the trial. However, the ethical imperative of this type of trials is to terminate early for futility. East optimizes the two-stage design using exact binomial probabilities. For each value of total sample size n and each value of stage 1 sample size n1 in the range (1, n − 1), integer values s1 and s are found that satisfy the type 1 and type 2 error constraints and minimize the expected sample size when π = π0 . The search occurs over the range s1 ∈ (0, n1 ). For each value of s1 the maximum value of s satisfying the type 2 error constraint is determined. Next the set of parameters (n, n1 , s1 , s) is examined to see whether it satisfies the type 1 error constraint. If it does, then the expected sample size of the corresponding design is compared to the minimum expected sample size previously achieved by the search algorithm. The search continues over the entire range of s1 . This is repeated for values in the range of n1 while keeping n fixed. The search over the range of n begins from the lower value of z1−α + z1−β π̄(1 − π̄) π1 − π0 where π̄ = (π0 + π1 )/2. A check must be performed below this starting point to ensure that this is indeed the smallest maximum sample size n for which there is a nontrivial (n1 , n2 > 0) two-stage design satisfying the type 1 and type 2 error constraints. The enumeration procedure then searches upwards from this minimum value of n until it is clear that the optimum had been determined. The minimum expected sample size for fixed n is not a unimodal function of n because of the discreteness of the underlying binomial distributions. Nevertheless, eventually as n increases the value of the local minima increase and it becomes clear that a global minimum has been found. 2615 <<< Contents * Index >>> U Theory - Design - Simon’s Two-Stage Design Simon’s Minimax and Admissible designs In addition to the optimal design, East offers Simon’s minimax approach, which minimizes the total sample size while satisfying both type-I and type-II constraints. The admissible two-stage design (Jung, et al. 2004), employs a graphical method geared to search for an alternative with more favorable features. This approach provides a compromised solution between the minimax and the optimal designs, that also satisfy type-I and type-II constraints. Resulting designs yield the same total sample sizes, as well as having the minimum expected sample size under the Null. 2616 <<< Contents * Index >>> V Theory-Design - Binomial Two-Sample Exact Tests This appendix deals with exact power and sample size computations for comparing two independent binomials. Exact power and sample size calculations are considered for the two-sided Fisher’s test, the unconditional one-sided tests of superiority, non-inferiority test, and two one-sided tests of equivalence. Exact tests on categorical data are usually computed conditionally, by fixing the margins of the contingency table at their observed values. Corresponding power computations are, however, more useful if they are performed unconditionally, before these table margins have been observed. Only then can they aid in determining if the sample size proposed for the study is adequate. This appendix shows how to obtain exact unconditional power as a weighted sum of exact conditional powers, and applies the results to exact conditional tests on 2 × 2 contingency tables. It also covers the exact power and sample size computations for exact unconditional tests of non-inferiority and equivalence of two binomial populations. The methods used by East to compute power and sample size of these two-sample exact tests are based on Fleiss (1981) for Fisher’s exact test and the conditional exact superiority test, Suissa and Shuster (1985) for the unconditional exact superiority test, Chan (1988) for the unconditional exact non-inferiority test, and finally Dunnett and Gent (1977) for the exact equivalence test. inxxequivalence testing of two binomials,power of In all that follows, consider sampling from two independent binomial populations. Suppose xc responses out of nc subjects are observed in the control group. The mean response rate in this group is denoted πc . Similarly define xt , nt , and πt for the treatment group. The observed data may be represented in a 2 × 2 contingency table x of the form xc nc − xc xt nt − xt m N −m nc nt N Section (V.1) explains computation of the power of Fisher’s exact test. In section (V.2) power of Barnard’s unconditional test of superiority is described. Section (V.3) continues with the power of the unconditional test of non-inferiority. Power for the unconditional test of equivalence between two binomial proportions is considered in section (V.4). Finally, section (V.5) briefly describes the computation of sample size for all these tests. 2617 <<< Contents V V.1 * Index >>> Theory-Design - Binomial Two-Sample Exact Tests Fisher’s Exact Test Fisher’s exact test is concerned with testing the null hypothesis V.1.1 Power H0 : πc = πt ≡ π (V.1) versus the two-sided alternative hypothesis H1 : πc 6= πt (V.2) at fixed sample sizes nc and nt . As is well known, the exact probability of x under H0 , conditional on xc + xt = m, is given by nc xc Pr(x|m, H0 ) = nt xt . N (V.3) m Notice that (V.3) does not depend on the common null response probability π. Thus this probability need not be specified for purposes of calculating power. The two response probabilities πc and πt are, however, needed to evaluate the probability of x under H1 . Fisher’s exact test is based on the exact distribution of the test statistic " n n # 1 T = − log V.1.1 xc t xt N m . (V.4) Exact Unconditional Power for Fisher’s Exact Test Consider first the exact power of level-α tests based on the statistic T . Let Γm = {x : xc + xt = m} (V.5) Γm (t) = {x ∈ Γm : T ≥ t} . (V.6) and define the critical region The exact null distribution of T may then be obtained by evaluating " n n # c t X xc xt Pr(T ≥ t|m, H0 ) = , N x∈Γm (t) for each possible value of t. 2618 V.1 Fisher’s Exact Test – V.1.1 Power m (V.7) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Let α be the maximum allowable type-1 error and tα (m) be the smallest possible cut-off such that Pr(T ≥ tα (m)|m, H0 ) ≤ α . (V.8) The conditional power of Fisher’s exact test is defined as " # X Qc Qt P Pr(T ≥ tα (m)|m, H1 ) = . x∈Γm Qc Qt (V.9) x∈Γm (tα (m)) where nc xc Qc = π (1 − πc )nc −xc xc c nt xt Qt = π (1 − πt )nt −xt xt t (V.10) (V.11) Denote this two-sided conditional power by (1 − β(m)). Then the two-sided unconditional power of Fisher’s exact test is defined as (1 − β) = N X (1 − β(m))P (m) (V.12) m=0 where P (m) = Pr(xc + xt = m|H1 ) , (V.13) is a convolution of two binomials under H1 . It is relatively straightforward to compute equation (V.12) as only 2 × 2 tables are involved. V.2 Power of Unconditional Test of Superiority V.2.1 Diff.of Proportions V.2.2 Ratio of Proportions V.2.1 Superiority Test: Difference of Proportions Superiority for Difference of Proportions – Case 1 Suppose it is desired to test H0 : πt − πc ≤ 0 against the one-sided alternative H1 : πt − πc > 0. Let πt and πc V.2 Power of Unconditional Test of Superiority – V.2.1 Diff.of Proportions 2619 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests denote the binomial probabilities for the treatment and control arms, respectively. Let xt and xc be the observed numbers of responses for the treatment and control arms, respectively. Let δ = πt − πc . It is of interest to test the null hypothesis H0 : δ ≤ 0 against one-sided alternative H1 : δ > 0. Let π̂i denote the estimate of πi based on ni observations from treatment i. The test statistic can be defined by T (xt , xc ) = r π̂t − π̂c π̃ (1 − π̃) n1c + (V.14) 1 nt where π̂t ,π̂c and π̃ are given by π̂c = xc xt xt + xc , π̂t = , π̃ = nc nt nt + nc (V.15) Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the sample space for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = (V.16) π (1 − πt ) t t πcxc (1 − πc ) c c xt xc t For a given πc and nominal significance level α, let bπc be defined by bπc = sup {b : Pπc (T (xt , xc ) < b | H0 ) ≤ α} (V.17) This probability Pπc (T (xt , xc ) < b | H0 ) is calculated based on the exact distribution of T (xt , xc ) under the null hypothesis πc = πt . This implies that, for a given πc , bπc is defined such that X sup b : Pπc (T (xt , xc ) < b | H0 ) = Ib (xt , xc ) fπc ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.18) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 2620 if T (xt , xc ) < b otherwise V.2 Power of Unconditional Test of Superiority – V.2.1 Diff.of Proportions (V.19) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that there is a one-to-one correspondence between the critical value bπc and the control rate πc . Let b∗ = inf {bπc : πc ∈ (0, 1)} and suppose that this infimum takes place at πc∗ . The decision rule of the exact test is to reject H0 if T (xt , xc ) < b∗ . Since b∗ is the infimum of the critical values over the possible range of πc , this test guarantees the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the underlying control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) < b∗ | H0 ) = Ib∗ (xt , xc ) fπc∗ ,πc∗ (xt , xc ) (V.20) (xt ,xc )∈X Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level α. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. Next we will show how the unconditional power of this exact test is calculated. The unconditional power is the probability of rejecting the null hypothesis under the alternative hypothesis. Suppose that one is interested in the power of this test under δ = δ1 (< 0) and πc . Under the alternative πt = πc + δ1 . Then the unconditional power is given by X Pπc (T (xt , xc ) < b∗ | H1 ) = Ib∗ (xt , xc ) fπc +δ1 ,πc (xt , xc ) (V.21) (xt ,xc )∈X where nt nc x n −x x n −x fπc +δ1 ,πc (xt , xc ) = (πc + δ1 ) t (1 − πc − δ1 ) t t (πc ) c (1 − πc ) c c xt xc (V.22) Superiority for Difference of Proportions – Case 2 Suppose it is desired to test H0 : πt − πc ≥ 0 against the one-sided alternative H1 : πt − πc < 0. Let πt and πc denote the binomial probabilities for the treatment and control arms, respectively.Let xt and xc be the observed numbers of responses for the treatment and control arms, respectively. Let δ = πt − πc . It is of interest to test the null hypothesis H0 : δ ≥ 0 V.2 Power of Unconditional Test of Superiority – V.2.1 Diff.of Proportions 2621 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests against one-sided alternative H1 : δ < 0. Let π̂i denote the estimate of πi based on ni observations from treatment i. The test statistic can be defined by T (xt , xc ) = r π̂t − π̂c π̃ (1 − π̃) n1c + (V.23) 1 nt where π̂t ,π̂c and π̃ are given by π̂c = xt xc + xc xc , π̂t = , π̃ = nc nt nt + nc (V.24) Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the sample space for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = (V.25) π (1 − πt ) t t πcxc (1 − πc ) c c xt xc t For a given πc and nominal significance level α, let bπc be defined by bπc = inf {b : Pπc (T (xt , xc ) > b | H0 ) ≤ α} (V.26) This probability Pπc (T (xt , xc ) > b | H0 ) is calculated based on the exact distribution of T (xt , xc ) under the null hypothesis πc = πt . This implies that, for a given πc , bπc is defined such that X inf b : Pπc (T (xt , xc ) > b | H0 ) = Ib (xt , xc ) fπc ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.27) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 2622 if T (xt , xc ) > b otherwise V.2 Power of Unconditional Test of Superiority – V.2.1 Diff.of Proportions (V.28) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that there is a one-to-one correspondence between the critical value bπc and the control rate πc . Let b∗ = sup {bπc : πc ∈ (0, 1)} and suppose that this supremum takes place at πc∗ . The decision rule of the exact test is to reject H0 if T (xt , xc ) > b∗ . Since b∗ is the supremum of the critical values over the possible range of πc , this test guarantees the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the underlying control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) > b∗ | H0 ) = Ib∗ (xt , xc ) fπc∗ ,πc∗ (xt , xc ) (V.29) (xt ,xc )∈X Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level α. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. Next we will show how the unconditional power of this exact test is calculated. The unconditional power is the probability of rejecting the null hypothesis under the alternative hypothesis. Suppose that one is interested in the power of this test under δ = δ1 (> 0) and πc . Under the alternative πt = πc + δ1 . Then the unconditional power is given by X Pπc (T (xt , xc ) > b∗ | H1 ) = Ib∗ (xt , xc ) fπc +δ1 ,πc (xt , xc ) (V.30) (xt ,xc )∈X where nt nc x n −x x n −x fπc +δ1 ,πc (xt , xc ) = (πc + δ1 ) t (1 − πc − δ1 ) t t (πc ) c (1 − πc ) c c xt xc (V.31) V.2.2 Superiority Test: Ratio of Proportions Superiority for Ratio of Proportions – Case 1 Suppose that it is desired to test H0 : ππct ≤ 1 against H1 : ππct > 1. Let πt and πc denote the binomial probabilities for the treatment and control arms, respectively, and let ρ = ππct . Let xt and xc be the observed number of responses for the treatment and control arms, respectively. It is of V.2 Power of Unconditional Test of Superiority – V.2.2 Ratio of Proportions 2623 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests interest to test the null hypothesis H0 : ρ ≤ 1 against the one-sided alternative H1 : ρ > 1. Let δ = ln(πt ) − ln(πc ) . Then it is equivalent to test H0 : δ ≤ 0 against H1 : δ > 0. Let π̂i denote the estimate of πi based on ni observations from treatment i. The test statistic is defined by ln (π̂t ) − ln (π̂c ) T =r 1 1−π̃ 1 + π̃ nt nc (V.32) where π̂t , π̂c and π̃ are given by π̂t = Note that 1−π̃ π̃ 1 nt + 1 nc xt xc xt + xc , π̂c = , π̃ = nt nc nt + nc (V.33) is the maximum likelihood estimate of the variance of ln (π̂t ) − ln (π̂c ) restricted under the null hypothesis. Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the sample space for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = (V.34) π (1 − πt ) t t πcxc (1 − πc ) c c xt xc t For a given πc and nominal significance level α, let bπc be defined by bπc = inf {b : Pπc (T (xt , xc ) > b | H0 ) ≤ α} (V.35) This probability Pπc (T (xt , xc ) > b | H0 ) is calculated based on the exact distribution of T (xt , xc ). This implies that, for a given πc , cπc is such that X inf b : Pπc (T (xt , xc ) > b | H0 ) = Ib (xt , xc ) fπc ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.36) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 2624 if T (xt , xc ) > b otherwise V.2 Power of Unconditional Test of Superiority – V.2.2 Ratio of Proportions (V.37) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that there is a one-to-one correspondence between the critical value bπc and the control rate πc . Let b∗ = sup {bπc : πb ∈ (0, 1)} and this supremum takes place at πc∗ . The decision rule of the exact test is to reject H0 if T (xt , xc ) > b∗ . Since b∗ is the supremum of the critical values over the possible range of πc , this test will guarantee the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) > b∗ | H0 ) = Ib∗ (xt , xc ) fπc∗ ,πc∗ (xt , xc ) (V.38) (xt ,xc )∈X Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. Next we will show how the unconditional power of this exact test is calculated. The unconditional power is the probability of rejecting the null hypothesis under the alternative hypothesis. Suppose that one is interested in the power of this test at ρ = ρ1 > 1 and πc . Under the alternative πt = ρ1 πc , then the unconditional power is given by X Pπc (T (xt , xc ) > b∗ | H1 ) = Ib∗ (xt , xc ) fρ1 πc ,πc (xt , xc ) (V.39) (xt ,xc )∈X where fρ1 πc ,πc (xt , xc ) = nt nc x n −x n −x (ρ1 πc ) t (1 − ρ1 πc ) t t πcxc (1 − πc ) c c (V.40) xt xc Superiority for Ratio for Proportions – Case 2 Suppose that it is desired to test H0 : ππct ≥ 1 against the one-sided alternative H1 : ππct < 1. In this case, we use the same test statistic as in Case 1 ln (π̂t ) − ln (π̂c ) T =r 1−π̃ 1 1 π̃ nt + nc V.2 Power of Unconditional Test of Superiority – V.2.2 Ratio of Proportions (V.41) 2625 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests where π̂t , π̂c , π̃are defined in the same way as the above. Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the set of all possible data values that could possibly be observed for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = (V.42) π (1 − πt ) t t πcxc (1 − πc ) c c xt xc t For a given πc and nominal significance level α, let bπc be defined by bπc = sup {b : Pπc (T (xt , xc ) < b | H0 ) ≤ α} (V.43) This probability Pπc (T (xt , xc ) < b | H0 ) is calculated based on the exact distribution of T (xt , xc ) . This implies that, for a given πc , bπc is such that X sup b : Pπc (T (xt , xc ) < b | H0 ) = Ib (xt , xc ) fπc ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.44) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 if T (xt , xc ) < b otherwise (V.45) Note that there is a one-to-one correspondence between the critical value bπc and the control rate πc . Let b∗ = inf {bπc : πc ∈ (0, 1)} and this infimum takes place at πc∗ . The decision rule of the exact test is to reject H0 if T (xt , xc ) < b∗ . Since b∗ is the infimum of the critical values over the possible range of πc , this test will garantee the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) < b∗ | H0 ) = Ib∗ (xt , xc ) fπc∗ ,πc∗ (xt , xc ) (V.46) (xt ,xc )∈X 2626 V.2 Power of Unconditional Test of Superiority – V.2.2 Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. The unconditional power is the probability of rejecting the null hypothesis under the alternative hypothesis. Suppose that one is interested in the power of this test at ρ = ρ1 < 1 and πc . Under the alternative ρ = ρ1 , πt = ρ1 πc , then the unconditional power is given by X Pπc (T (xt , xc ) < b∗ | H1 ) = Ib∗ (xt , xc ) fρ1 πc ,πc (xt , xc ) (V.47) (xt ,xc )∈X where nt nc n −x x n −x fρ1 πc ,πc (xt , xc ) = (ρ1 πc ) t (1 − ρ1 πc ) t t πcxc (1 − πc ) c c (V.48) xt xc V.3 Power of the Unconditional Test of Non-Inferiority V.3.1 Diff.of Proportions V.3.2 Ratio of Proportions V.3.1 Non-inferiority Test: Difference of Proportions Non-inferiority for Difference of Proportions – Case 1 Suppose it is desired to test H0 : πt − πc ≤ δ0 (δ0 < 0) against the one-sided alternative H1 : πt − πc > δ0 . Let πt and πc denote the binomial probabilities for the treatment and control arms, respectively. Let xt and xc be the observed numbers of responses for the treatment and control arms, respectively. Let δ = πt − πc . It is of interest to test the null hypothesis H0 : δ ≤ δ0 against one-sided alternative H1 : δ > δ0 . Let π̂i denote the estimate of πi based on ni observations from treatment i. The test statistic can be defined by T (xt , xc ) = q π̂t − π̂c − δ0 π̃c (1−π̃c ) nc + +π̃t (1−π̃t ) nt (V.49) where π̂t and π̂c are given by π̂c = xc xt , π̂t = nc nt (V.50) and π̃t and π̃c are the maximum likelihood estimates of πt and πc , respectively, restricted under the null hypothesis such that π̃t − π̃c = δ0 . Miettinen and Nurminen V.3 Power of Unconditional Test of Noninferiority – V.3.1 Diff.of Proportions 2627 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests (1985) have shown that one may obtain these restricted maximum likelihood estimates by solving the third degree likelihood equation 3 X Lk π̃ck = 0 (V.51) k=0 for π̃c and setting π̃t = π̃c + δ0 , where L3 = N = nc + nt L2 = (nt + 2nc ) δ0 − N − xc − xt L1 = (nc δ0 − N − 2xc ) δ0 + xc + xt L0 = xc δ0 (1 − δ0 ) (V.52) Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the sample space for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = (V.53) π (1 − πt ) t t πcxc (1 − πc ) c c xt xc t For a given πc and nominal significance level α, let bπc be defined by bπc = sup {b : Pπc (T (xt , xc ) < b | H0 ) ≤ α} (V.54) This probability Pπc (T (xt , xc ) < b | H0 ) is calculated based on the exact distribution of T (xt , xc ) under the null hypothesis πt − πc = δ0 . This implies that, for a given πc , bπc is defined such that X sup b : Pπc (T (xt , xc ) < b | H0 ) = Ib (xt , xc ) fπc −δ0 ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.55) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 2628 if T (xt , xc ) < b otherwise V.3 Power of Unconditional Test of Noninferiority – V.3.1 Diff.of Proportions (V.56) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that there is a one-to-one correspondence between the critical value bπc and the control rate πc . Let b∗ = inf {bπc : πc ∈ (0, 1)} and suppose that this infimum takes place at πc∗ . The decision rule of the exact test is to reject H0 if T (xt , xc ) < b∗ . Since b∗ is the infimum of the critical values over the possible range of πc , this test guarantees the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the null hypothesis is true and the underlying control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) < b∗ | H0 ) = Ib∗ (xt , xc ) fπc∗ −δ0 ,πc∗ (xt , xc ) (V.57) (xt ,xc )∈X Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level α. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. Next we will show how the unconditional power of this exact test is calculated. The unconditional power is the probability of rejecting the null hypothesis under the alternative hypothesis. Suppose that one is interested in the power of this test under δ = δ1 (< δ0 ) and πc . Under the alternative we have πc = πt + δ1 . Then the unconditional power is given by X Pπc (T (xt , xc ) > b∗ | H1 ) = Ib∗ (xt , xc ) fπc +δ1 ,πc (xt , xc ) (V.58) (xt ,xc )∈X where nt nc x n −x x n −x fπc +δ1 ,πc (xt , xc ) = (πc + δ1 ) t (1 − πc − δ1 ) t t (πc ) c (1 − πc ) c c xt xc (V.59) Non-inferiority for Difference of Proportions – Case 2 Suppose it is desired to test H0 : πt − πc ≥ δ0 (> 0) against the one-sided alternative H1 : πt − πc < δ0 . Let πt and πc denote the binomial probabilities for the treatment and control arms, respectively. Let xt and xc be the observed numbers of responses for the treatment and V.3 Power of Unconditional Test of Noninferiority – V.3.1 Diff.of Proportions 2629 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests control arms, respectively. Let δ = πt − πc . It is of interest to test the null hypothesis H0 : δ ≥ δ0 against one-sided alternative H1 : δ < δ0 . Let π̂i denote the estimate of πi based on ni observations from treatment i. The test statistic can be defined by T (xt , xc ) = q π̂t − π̂c − δ0 π̃c (1−π̃c ) nc + +π̃t (1−π̃t ) nt (V.60) where π̂t , π̂c , π̃t and π̃c are defined in the same way as in Case 1. Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the sample space for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = (V.61) π (1 − πt ) t t πcxc (1 − πc ) c c xt xc t For a given πc and nominal significance level α, let bπc be defined by bπc = inf {b : Pπc (T (xt , xc ) > b | H0 ) ≤ α} (V.62) This probability Pπc (T (xt , xc ) > b | H0 ) is calculated based on the exact distribution of T (xt , xc ) under the null hypothesis πt − πc = δ0 . This implies that, for a given πc , bπc is defined such that X inf b : Pπc (T (xt , xc ) > b | H0 ) = Ib (xt , xc ) fπc −δ0 ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.63) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 if T (xt , xc ) > b otherwise (V.64) Note that there is a one-to-one correspondence between the critical value bπc and the control rate πc . Let b∗ = sup {bπc : πc ∈ (0, 1)} and suppose that this supremum takes place at πc∗ . 2630 V.3 Power of Unconditional Test of Noninferiority – V.3.1 Diff.of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 The decision rule of the exact test is to reject H0 if T (xt , xc ) > b∗ . Since b∗ is the supremum of the critical values over the possible range of πc , this test guarantees the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the null hypothesis is true and the underlying control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) > b∗ | H0 ) = Ib∗ (xt , xc ) fπc∗ −δ0 ,πc∗ (xt , xc ) (V.65) (xt ,xc )∈X Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level α. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. Next we will show how the unconditional power of this exact test is calculated. The unconditional power is the probability of rejecting the null hypothesis under the alternative hypothesis. Suppose that one is interested in the power of this test under δ = δ1 (> δ0 ) and πc . Under the alternative we have πc = πt + δ1 . Then the unconditional power is given by X Pπc (T (xt , xc ) > b∗ | H1 ) = Ib∗ (xt , xc ) fπc +δ1 ,πc (xt , xc ) (V.66) (xt ,xc )∈X where fπc +δ1 ,πc (xt , xc ) = V.3.2 nt nc x n −x x n −x (πc + δ1 ) t (1 − πc − δ1 ) t t (πc ) c (1 − πc ) c c xt xc (V.67) Non-inferiority Test: Ratio of Proportions Non-inferiority for Ratio of Proportions – Case 1 Suppose it is desired to test H0 : ππct ≤ ρ0 (< 1) against the one-side alternative H1 : ππct > ρ0 . An alternative approach to establishing non-inferiority of an experimental treatment to the control treatment with respect to the ratio of probabilities was proposed by Farrington and Manning (1990). Let πt and πc denote the binomial probabilities for the treatment and control arms, respectively. Let xt and xc be the observed numbers of responses for the treatment and control arms, respectively. Let ρ = ππct . Suppose that, for some ρ0 < 1, V.3 Power of Unconditional Test of Noninferiority – V.3.2 Ratio of Proportions 2631 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests one is interested in testing the null hypothesis H0 : ρ ≤ ρ0 against one-sided alternative H1 : ρ > ρ0 . Let π̂i denote the estimate of πi based on ni observations from treatment i. The test statistic can be defined by T (xt , xc ) = q π̂t − ρ0 π̂c π̃t (1−π̃t ) nt + ρ20 π̃c (1−π̃c ) nc (V.68) where π̂t and π̂c are given by π̂t = xt xc , π̂c = nt nc (V.69) and π̃t and π̃c are the maximum likelihood estimates of πt and πc , respectively, restricted under the null hypothesis such that π̃π̃ct = ρ0 . Miettinen and Nurminen (1985) have shown that one may obtain these restricted maximum likelihood estimates by solving a quadratic likelihood equation. Thus √ −B − B 2 − 4AC π̃c = (V.70) 2A and π̃t = ρ0 π̃c (V.71) A = ρ0 (nt + nc ) B = − (ρ0 nc + xc + nt + ρ0 xt ) C = xc + xt (V.72) where Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the sample space for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = π (1 − πt ) t t πcxc (1 − πc ) c c (V.73) xt xc t For a given πc and nominal significance level α, let bπc be defined by bπc = inf {b : Pπc (T (xt , xc ) > b | H0 ) ≤ α} 2632 V.3 Power of Unconditional Test of Noninferiority – V.3.2 Ratio of Proportions (V.74) <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 This probability Pπc (T (xt , xc ) > b | H0 ) is calculated based on the exact distribution of T (xt , xc ) under the null hypothesis πt = ρ0 πc . This implies that, for a given πc , bπc is such that X inf b : Pπc (T (xt , xc ) > b | H0 ) = Ib (xt , xc ) fρ0 πc ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.75) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 if T (xt , xc ) > b otherwise (V.76) Note that there is a one-to-one correspondence between the critical value bπc and the control rate πc . Let b∗ = sup {bπc : πc ∈ (0, 1)} and suppose that this supremum takes place at πc∗ . The decision rule of the exact test is to reject H0 if T (xt , xc ) > b∗ . Since b∗ is the supremum of the critical values over the possible range of πc , this test guarantees the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the underlying control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) > b∗ | H0 ) = Ib∗ (xt , xc ) fρ0 πc∗ ,πc∗ (xt , xc ) (V.77) (xt ,xc )∈X Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level α. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. Next we will show how the unconditional power of this exact test is calculated. The unconditional power is the probability of rejecting the null hypothesis under the alternative hypothesis. Suppose that one is interested in the power of this test when ρ = ρ1 (> ρ0 ) and the response rate for the control arm is πc . Under the alternative ρ = ρ1 , we have πt = ρ1 πc . Then the unconditional power is given by X Pπc (T (xt , xc ) > b∗ | H1 ) = Ib∗ (xt , xc ) fρ1 πc ,πc (xt , xc ) (V.78) (xt ,xc )∈X where fρ1 πc ,πc (xt , xc ) = nt nc x n −x x n −x (ρ1 πc ) t (1 − ρ1 πc ) t t (πc ) c (1 − πc ) c c xt xc (V.79) V.3 Power of Unconditional Test of Noninferiority – V.3.2 Ratio of Proportions 2633 <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests Non-inferiority for Ratio of Proportions – Case 2 Suppose it is desired to test H0 : ππct ≥ ρ0 (> 1) against the one-sided alternative H1 : ππct < ρ0 . In this case, the same test statistic can be used T (xt , xc ) = q π̂t − ρ0 π̂c π̃t (1−π̃t ) nt + ρ20 π̃c (1−π̃c ) nc (V.80) where π̂t , π̂c , π̃t and π̃c are defined in the same way as in Case 1. Let X = {(xt , xc ) : 0 ≤ xt ≤ nt , 0 ≤ xc ≤ nc } denote the sample space for the 2 × 2 table. Let fπt ,πc (xt , xc ) denote the probability of observing the data (xt , xc ) ∈ X when the response rates for the treatment and control arms are πt and πc , respectively, which is given by nt nc xt n −x n −x fπt ,πc (xt , xc ) = (V.81) π (1 − πt ) t t πcxc (1 − πc ) c c xt xc t For a given πc and nominal significance level α, let bπc be defined by X sup b : Pπc (T (xt , xc ) < b | H0 ) = Ib (xt , xc ) fρ0 πc ,πc (xt , xc ) ≤ α (xt ,xc )∈X (V.82) where the indicator function is defined by ( 1 Ib (xt , xc ) = 0 if T (xt , xc ) < b otherwise (V.83) Let b∗ = inf {bπc : πc ∈ (0, 1)} and suppose that this infimum takes place at πc∗ . The decision rule of the exact test is to reject H0 if T (xt , xc ) < b∗ . Since b∗ is the infimum of the critical values over the possible range of πc , this test will guarantee the type I error control regardless of the underlying true response rate for the control arm. The attained significance level of this test is the exact probability of rejecting the null hypothesis when the underlying control rate equals πc∗ which is given by X Pπc∗ (T (xt , xc ) < b∗ | H0 ) = Ib∗ (xt , xc ) fρ0 πc∗ ,πc∗ (xt , xc ) (V.84) (xt ,xc )∈X 2634 V.3 Power of Unconditional Test of Noninferiority – V.3.2 Ratio of Proportions <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Note that the attained significance level is the maximum type I error one can actually commit using this test given the desired significance level α. Due to the discreteness of the distributions, the attained significance level is always bounded above by α. The unconditional power under the specific alternative ρ = ρ1 and πc is given by X Pπc (T (xt , xc ) < b∗ | H1 ) = Ib∗ (xt , xc ) fρ1 πc ,πc (xt , xc ) (V.85) (xt ,xc )∈X where nt nc n −x x n −x x fρ1 πc ,πc (xt , xc ) = (ρ1 πc ) t (1 − ρ1 πc ) t t (πc ) c (1 − πc ) c c xt xc (V.86) V.4 Power of the Unconditional Test of Equivalence V.4.1 Power Equivalence testing usually arises in the context of a clinical trial comparing two treatments in which the goal is to assess whether the two treatments are equally efficacious rather than attempting to assess whether one treatment is more efficacious than the other. This implies an inversion of the conventional formulation of null and alternative hypotheses. The statistical formulation proposed by Dunnett and Gent (1977) is used to describe this procedure. First define the true underlying treatment difference δ = |πt − πc | (V.87) and specify an equivalence margin, δ0 > 0, such that if δ < δ0 the two treatments are considered equivalent while if δ ≥ δ0 , they are not. Interest resides in testing the null hypothesis H0 : δ = δ 0 (V.88) against the alternative hypothesis H1 : δ < δ0 . (V.89) The null hypothesis (V.88) really consists of the two possibilities H01 : πc − πt = δ0 (V.90) H02 : πt − πc = δ0 . (V.91) V.4 Power of the Unconditional Test of Equivalence 2635 and <<< Contents V * Index >>> Theory-Design - Binomial Two-Sample Exact Tests In order to cater to both possibilities two one-sided level-α tests are performed using the test statistics π̂c − π̂t − δ0 T1 = q (V.92) (π̃c )(1−π̃c ) (π̃t )(1−π̃t ) + nc nt and T2 = q π̂t − π̂c − δ0 (π̃c )(1−π̃c ) nc + (π̃t )(1−π̃t ) nt . (V.93) Clearly T1 ∼ N (0, 1) conditional on H01 and T2 ∼ N (0, 1) conditional on H02 . In order to reject the null hypothesis (V.88) and declare equivalence, both H01 and H02 must be rejected. The rejection region is thus the joint event {(T1 ≤ zα ) ∩ (T2 ≤ zα )}. It can be shown that under the null hypothesis (V.88), regardless of whether H01 or H02 holds, Pr{(T1 ≤ zα ) ∩ (T2 ≤ zα )} ≤ α (V.94) thereby preserving the type-1 error. V.4.1 Exact Unconditional Power for Equivalence Tests Suppose it is desired to obtain the exact power of the two one-sided equivalence test at specific values of πc and πt with |πc − πt | = δ1 where 0 ≤ δ1 ≤ δ0 . The exact unconditional power is then readily evaluated as the probability, Pr{(T1 ≤ zα ) ∩ (T2 ≤ zα )|πc , πt }, of falling in the rejection region under the alternative hypothesis. Denote this probability by (1 − β). Then (1−β) = nc X nt X xc =0 xt nc xc nc −xc nt Iα (xc , xt ) πc (1−πc ) πtxt (1−πt )nt −xt , (V.95) x x c t =0 where the indicator function, Iα (xc , xt ), assumes the value 1 if (Tc ≤ zα ) ∩ (Tt ≤ zα ) ≤ α and assumes the value 0 otherwise. V.5 Sample Size Computations For all tests discussed in this section, the sample size for a fixed unconditional power value is obtained by evaluating the null and alternative power functions over a range of sample sizes until an N is found that gives the desired power. Since neither α nor β are guaranteed to be attainable due to discreteness of the binomial distribution, the solution to this parameter space search N is not unique. The choice of sample size for a particular trial should depend on the priorities given to type 1 and 2 errors by the investigator. Possible prioritization may involve (1) primarily 2636 V.5 Sample Size Computations <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 maximizing the attained type 1 error while maintaining it below α, (2) primarily maximizing the attained type 2 error while maintaining it below β, or (3) optimizing to get α∗ and β ∗ as close to α and β, respectively as possible. The most practical choice of sample size, however, may be that sample size above which power is guaranteed to be at least (1 − β). V.5 Sample Size Computations 2637 <<< Contents * Index >>> W Classification Table Under usual notation, the formulas used in computing classification errors are listed below. h1i =Hat Diagonal element assuming y = 1, Gi = 1; h0i =Hat Diagonal element assuming y = 0, Gi = 1; Vi = Cov × X0i βi0 = β − (1−π̂i ) 1−h1i × Vi π̂i1 = Xi βi1 βi0 = β − (−π̂i ) 1−h0i × Vi π̂i0 = Xi βi0 P (A|B̄) = 1 n2 P I(π̂i0 > z) i∈C2 Name Prob event Cut-off prob Correct events (CE) Correct noevents(CN ) Incorrect events (IE) Incorrect noevents (IN ) Percent correct Sensitivty Specificty False pos False neg 2638 Formula Pe z P I(π̂i1 i∈C1 P I(π̂i0 i∈C2 P I(π̂i0 i∈C2 P I(π̂i1 Comment ≥ z) ≤ z) > z) < z) i∈C1 (CE+CN ) (n1 +n2 ) (CE) (n1 ) (CN ) (n2 ) IE e × 1−P n2 v1 1− CE n1 × v1 = Pe 1−v1 IE n2 + Pe × CE n1 − IE n2 <<< Contents * Index >>> X Glossary Accrual rate The number of subjects entering the study per unit of time. Adaptive study design In an adaptive design estimated treatment differences at interim analyses can be used to make mid-course data-dependent alterations to the trial design – changes in sample size, error spending function, and number and spacing of interim looks – while preserving the type-1 error. Alpha spending function The spending function to be used for allocating the type-1 cumulative error probability as a function of the information fraction. Alpha spent The cumulative amount of type-1 error probability spent up to and including a given look. ASN (Average Sample Number) chart This plot provides a graphical rendition of how the ASN (Average Sample Number, the expected sample size) varies as a function of a range of possible values for the effect size or non-inferiority margin (e.g. standardized difference, difference in proportions, etc.). Assigned fraction (treatment) The proportion, r, of subjects assigned (randomized) to the treatment (experimental) arm over the total number of subjects in the trial. Beta spending function The spending function to be used for allocating the type-2 cumulative error probability as a function of the information fraction. Beta spent The cumulative amount of type-2 error probability spent up to and including a given look. 2639 <<< Contents X * Index >>> Glossary Binding boundaries Binding boundaries require the termination of the trial if the test statistic crosses the futility boundary; otherwise the type-1 error might be inflated. Contrariwise, non-binding boundaries produce the desired power and preserve the type-1 error so that the crossing of the futility boundary may be overruled. Bioequivalence A test formulation of a drug (t) and the control (or reference) formulation of the same drug (c) are considered to be bioequivalent if the rate and extent of absorption are similar. The goal is to establish that the difference or log-ratio of the means of the observations from the test formulation and the control is within a specified equivalence margin. Boundaries Boundaries are the generalization to group sequential methods of the critical values of a test, the values beyond which the standardized test statistic supplies enough evidence to reject H0 or H1 . Boundary families allow the user to specify how conservatively or aggressively tests are performed at each analysis point, while preserving the type-1 error, the probability of accepting H1 when H0 is in fact true. Available boundary families are p-value, Haybittle-Peto Power, Wang-Tsiatis Spending Functions, Published Spending Function, and Interpolated. Boundary chart This plot provides a graphical rendition of the stopping boundaries (”Nominal critical point”) corresponding to each look, the latter being indexed by the cumulative information (e.g. sample size, number of events, etc. depending on the endpoint). For the meaning of the various ”Boundary Scales” please refer to other sections of the manual. Boundary family The boundaries at the design stage can be derived with reference to one of several approaches, depending also on whether early stopping is allowed in favor of the null only, of the alternative only, or of both. The Haybittle-Peto boundaries (p-value family) are specified in terms of a constant p-value for all interim analyses; East will compute the p-value to be used at the final analysis in order to satisfy the desired significance level of the procedure or the user can specify it and then East computes 2640 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 the achieved significance level of the procedure. The Wang and Tsiatis (early stopping for H0 only) and the Pampallona and Tsiatis (early stopping for H0 or H1 ) families are direct application of the respective power boundary families, indexed by a boundary shape parameter, Delta, in the range -0.5 to 0.5: small values of Delta yield boundaries with a small probability of early stopping and a correspondingly low average sample size, vice versa for large values of Delta. Spending Function Boundaries (published) are defined by published error spending functions (e.g. Lan-DeMets, Rho family, Gamma family). Spending Functions Interpolated are defined by the user by specifying cumulative error probabilities at various looks. When interim looks are different from the design, linear interpolation is used for computing cumulative end probabilities spent. Boundary scale See Boundary chart. Boundary shape parameter The Wang-Tsiatis and the Pampallona-Tsiatis power boundaries are indexed by a shape parameter varying between -0.5 and 0.5. Smaller values of the shape parameter correspond to boundaries with reduced probability of early stopping but also to a smaller maximum sample size, vice-versa for larger values of the shape parameter. For designs allowing for early stopping in favor of either H0 or H1 East allows for different shape parameters to govern the boundary for early rejection of H1 (denoted in the East worksheets as ”Boundary shape parameter to reject H1 ”) and the boundary for early rejection of H0 (denoted in the East worksheets as ”Boundary shape parameter to reject H0 ”). Coefficient of variation The coefficient of variation is a summary measure of variability. It is calculated by taking the ratio of the standard deviation to the mean Committed accrual (duration or subjects) The committed number of subjects that can be accrued into the study (or equivalently, since the accrual rate is constant, the maximum accrual duration). In time to event studies, the power of the study is not determined by the number of subjects enrolled but by the number of events observed. Thus, there exists a range of accrual (bounded by the quantities 2641 <<< Contents X * Index >>> Glossary Min and Max), combined with a range of study durations, that would all produce the desired power. The lower bound of the range (Minimum committed number of subjects to accrue) corresponds to an initial estimate of the number of events to be observed for the study to have the desired power: with such a low accrual, the study will however be very long since all subjects accrued will have to fail before the final analysis can be performed. On the other hand, there is no need for the study to accrue more than the upper bound of the range (Maximum committed number of subjects to accrual), that is to keep the study open to accrual beyond the point in time when the required number of events has been observed. The user can input values for the accrual within the suggested range remembering that the larger the accrual the shorter the total study duration. Conditional power The conditional power is the probability of rejecting the null at one of the future looks given the data accumulated so far. This quantity can contribute, together with any other relevant information, to the decision to terminate or continue the study with a possible increase of the study’s sample size. Conditional power at ideal next look position (CP at INLP) The conditional power at ideal next look position is the probability of rejecting the null at the next and final look given the data accumulated so far and if the next and final look was performed at the recommended ”Ideal next look position”. This quantity can contribute, together with any other relevant information, to the decision to terminate or continue the study. Conditional power chart This plot provides a graphical rendition of how the conditional power of the study at the current look varies as a function of the effect size (e.g. standardized difference, difference in proportions, etc.). Confidence interval adjusted The method suggested by Kim and DeMets (1987) is applied to derive the adjusted confidence interval at the end of the study allowing for repeated significance testing. This method was generalized by Brannath, Mehta and Posch (2008) for the parameter estimation in the adaptive trial. Crossover ANOVA sqrt(MSE) 2642 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 In a crossover design trial, the square root of the Mean Squared Error (MSE) from an ANOVA analysis is an estimate of the standard deviation of the error. Chen-DeMets-Lan (CDL) method The method for making sample size moditications to an ongoing trial and then performing the interim monitoring and final analysis with the classical Wald statistic. The method is further extended to a more general setting by Gao, Ware and Mehta (2008). Cui, Hung, and Wang (CHW) method The CHW method is a procedure for adaptive sample size modification of an on-going two-arm, K-look group sequential clinical trial. It is based on the examination of data at any interim look L < K, making a sample size modification if required, and continuing with the interim monitoring, using a modified test statistic that combines the standardized treatment effects before and after the modification as a weighted sum, with appropriate weights so as to preserve the type-1 error. Cumulative accrual The cumulative number of subjects accrued up to a given look. Cumulative events The cumulative number of events observed up to a given look. Design proportion When designing a study to compare binomial proportions, the expected value of the difference between the two groups being compared is expressed in terms of the expected proportion of success in the Treatment and in the Control groups respectively. In non-inferiority studies this difference represents the non-inferiority margin (the treatment arm should not be worse than the control arm by more than the non-inferiority margin). A setting of particular importance for binomial studies is the Casagrande, Pike, and Smith (1978) correction factor for the normal approximation to the binomial. It may be enabled and disabled by checking the appropriate checkbox located in Settings-Binomial. By default this correction is disabled. 2643 <<< Contents X * Index >>> Glossary Duration-accrual chart For time to event endpoint. For a range of values of the committed accrual duration (or committed number of subjects) this chart shows the corresponding total study duration, that is the expected time by which the number of events needed to satisfy power considerations will be observed. Effect size The Information based module is not sensitive to the actual measurement scale in which the parameter of interest is expressed but only to its magnitude, the Effect Size: its value can express a difference in means or in proportions or even the coefficient from a complex regression model. Equivalence An equivalence trial aims to determine if two treatments have similar consequence. It aims to reject the null hypothesis that the difference between the two treatments falls outside the pre-specified lower and upper equivalence boundaries in favor of the alternative hypothesis that the difference between the two treatments falls within these boundaries. Equivalence limits In an equivalence trial for the difference of two normal means, the goal is to establish that the treatment mean and control mean are within an equivalence range. This range is delimited by the lower and upper equivalence limits δl and δu , which need not be equidistant from the value specified for the difference of means under the alternative hypothesis δ1 . Equivalence margin (δ0 ) In an equivalence trial, the goal is to establish that the treatment and control parameters are within a specified value δ0 . This δ0 value is the equivalence margin and is often defined as a proportion, such as 25% of the control mean for the comparison of the mean normal distributions. Error spending chart This plot provides a graphical rendition of the error probability spending functions as functions of the cumulative information fraction. Events/Accruals vs. Time chart For time to event endpoints, this chart shows how accrual increases (at a constant rate) until the end of the accrual period and how events will 2644 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 accumulate on each treatment arm (depending on the corresponding failure rate) as the study progresses in chronological time (horizontal axis). Expected values under H0 , H1 and H1/2 The probability to stop the trial at any of the planned looks can be computed under the null (H0 ), the alternative (H1 ) or the mid-alternative (H1/2 ). These probabilities can be used to compute several expected quantities at study termination. The expected accrual, for instance, can be computed as the sum, over all looks, of the probability of stopping at the given look times the accumulated accrual (sample size) at that look. Fixed sample study information In the design worksheet of the General module, one of the needed input parameters is the information (e.g. number of subjects) required for the fixed-sample-study. This quantity can be obtained from any sample size software and on that basis East generates a group sequential study with the same size and power to detect the same alternative. See also Inflation Factor. Group sequential designs Group sequential designs allow the investigator to take early interim looks at the data for evidence of efficacy, harm, and/or futility with the aim of possibly stopping the trial early. The planned number of looks describes the number of time points, including the closing date of the study, at which the investigator plans to analyze the thus far collected data. The value 1 corresponds to a classic fixed-sample-size design with a single look at the end of the study when all data have been collected. The planned number of looks K can vary from 1 to 10. The number eventually performed may differ from K. Hypothesis to be rejected Early stopping can be allowed for in favor of H1 only (early stopping with rejection of H0 ) or in favor of either H0 (futility) or H1 or in favor of H0 only (futility only). Ideal next look position After each look East revises the maximum information (e.g. sample size, number of failures etc. depending on the endpoint) to be achieved for the study to satisfy the desired type-1 and type-2 error probabilities allowing 2645 <<< Contents X * Index >>> Glossary for the actually adopted schedule of analyses (which may be different from the tentative number and relative spacing assumed at design). This quantity can contribute, together with any other relevant information, to decide when to perform a further analysis of the accumulating data. Inflation factor More information (e.g. number of subjects) is required for a group sequential study than for the corresponding fixed-sample study with the same operating characteristics. This is the penalty associated with repeated significance testing. The inflation factor is the proportionality constant (ratio) relating the information requirements of group sequential trial to its corresponding fixed- sample study. This ratio is independent of the test, the endpoint of interest or the actual magnitude of the effect size of interest. East uses this result in the General module to set up a group sequential study on the basis of the information requirements of a fixed-sample study. See also Fixed Sample Study Information. Information calculator The calculator applies to parallel two-arm randomized designs with normal or binomial endpoints. During interim monitoring of an information based study the accumulated information up to the current look can be computed on the basis of the current values of the sample size and of the observed sample mean and standard deviation (if the underlying endpoint follows a normal distribution) or number of responses (if the underlying endpoint follows a binomial distribution) in the control and treatment arm respectively. It computes the achieved statistical information, the value of the current test statistic and a new estimate of the maximum sample size required. This latter quantity may differ from the value obtained at design (using the Sample Size Calculator) if the statistical information actually accumulates at a higher or lower pace than anticipated (i.e. if for normal data the actual standard deviation of the observation is different from the value used at design and for binomial data if the observed success rate in the control group is different from the value used at design. Information fraction This is defined as the ratio of the information at the current time-point to the maximum information committed to the study. For a large number of studies, including studies with normal and binomial end points the information fraction is simply the ratio of the current sample size to the 2646 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 maximum sample size committed to the study. For time to event end points it is the ratio of the current number of events (such as failures) to the total number of events committed to the study. For studies in which the monitoring will be performed on the Fisher information scale, the information is estimated as the square inverse of the estimate of the standard error of the parameter under investigation. Thus the information fraction is the ratio of the current inverse square estimate to the maximum inverse square estimate needed to achieve the goals of the study. Information fraction is also referred to sometimes as Process time. Last look logic When the trial has to come to an end for administrative reasons (i.e. not because one of the boundaries has been crossed or because the maximum information has been reached) the boundary for this last look should be determined by spending the remaining alpha so as to respect the desired size of the testing procedure. In interim monitoring, this is what happens when the Tools-Last Look menu item is selected in East before performing the next look. Look number The counter identifying successive analyses of the data. Maximum accrual The accrual to be reached if no early stopping occurred (i.e. if the study went on until the last look). This quantity satisfies the desired significance level and power of the design. Maximum accrual duration In studies with time to event endpoint, the time required to achieve the necessary maximum accrual. Maximum events See Maximum study duration. Maximum information The information to be achieved if the study does not stop at any interim analysis. This quantity computed at design is revised during interim monitoring to allow for the actual schedule of looks, since their number 2647 <<< Contents X * Index >>> Glossary and relative spacing may be different than assumed at design (see Ideal next look position). Maximum study duration In studies with time to event endpoint, the study duration and the corresponding number of events of the study to satisfy the desired operating characteristics of the study if no early stopping occurs. Median survival When designing a study to compare the distributions of the times to event, the expected relative advantage of the treatments being compared is expressed, by default, in terms of the expected median survival in the treatment and in the control groups respectively. Alternatively, the Design Wizard allows the specification of the relative survival experience in terms of expected percent survival at a specific time or in terms of hazard rates. Median unbiased estimator (MUE), Adjusted The method suggested by Kim (1989) is applied to derive the median unbiased estimator of the effect size at the end of the study allowing for repeated significance testing. Mid-alternative Studies where early stopping may occur either in favor of the null or of the alternative hypothesis, may extend until relatively large stopping times, if the alternative has been overestimated. In such cases, the test statistic will tend to fluctuate within the continuation region. East computes the expected quantities (e.g. sample size or accrual) at termination not only under the null and the alternative but also under an intermediate hypothesis. Due to the non-linearity of the transformation linking the scale in which the effect size of interest to the user is expressed and the internal standardized scale used by East, the mid-alternative does not correspond to half of the alternative. The expected quantities computed by East under the mid-alternative, however, express the worst case scenarios. Muller and Schafer method In adaptive trials, the Muller and Schafer method aims to preserve the conditional type-1 error computed at the time of the adaptation. It is permissible to make any desired data dependent change to an ongoing group sequential trial, possibly more than once, by the simple process of 2648 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 preserving the conditional type-1 error of the remainder of the trial after each change. Nominal critical point A synonym for the boundary value against which the test statistic has to be compared. The nominal critical point is expressed in the same scale as a standard normal deviate in order to facilitate the comparison against the test statistic computed at each look. This explains the use of the adjective ”Nominal”. See also Test Statistic and Nominal Significance Level. Nominal significance level The probability of values more extreme than the Nominal Critical Point according to a standard normal distribution. See also Nominal Critical Point. Non-binding boundaries Non-binding boundaries produce the desired power and preserve the type-1 error so that the crossing of the futility boundary may be overruled. Contrariwise, binding boundaries require the termination of the trial if the test statistic crosses the futility boundary; otherwise the type-1 error might be inflated. Non-inferiority margin In non-inferiority designs for difference, the non-inferiority margin (δ0 ) is the magnitude of the difference between the treatment and the control arm that should not be exceeded for the treatment arm to be considered non-inferior to the control arm. In non-inferiority designs for ratio, the non-inferiority margin (ρ0 ) is the ratio between the treatment proportion response and the control proportion response that should not be exceeded for the treatment arm to be considered non-inferior to the control arm. In non-inferiority designs for odds ratio, the non-inferiority margin (Ψ0 ) is the odds ratio between the treatment proportion response and the control proportion response that should not be exceeded for the treatment arm to be considered non-inferior to the control arm. Non-inferiority trial A non-inferiority trial aims to determine if the outcome of an experimental treatment is no worse than the outcome of the standard treatment. It aims to reject the null hypothesis that the experimental treatment exceeds a 2649 <<< Contents X * Index >>> Glossary pre-specified non-inferiority margin. The amount by which the mean response on the experimental arm is worse than the mean response on the control arm must fall within this non-inferiority margin for the claim of non-inferiority to be sustained. Nuisance parameters Nuisance parameters affect the results of mathematical and statistical models but there may be insufficient information about their magnitudes. In clinical trials inaccurate initial estimates of these parameters will lead to incorrect estimates of the sample size or other resources and the study will not have the correct operating characteristics. Adaptive trials may estimate nuisance parameters based on early results and, then given these more accurate estimates, the conclusions of the trial may be more accurate than those from traditional trials based on poorly estimated nuisance parameters. Number of looks (K) For design purposes, K represents the tentative number of analyses to be performed during the interim monitoring phase up to and including the last look. The number of analyses eventually performed during interim monitoring of the trial can be different from K. Pampallona-Tsiatis boundaries These power boundaries are characterized by two shape parameters: ∆1 for the boundaries that facilitate early stopping for efficacy by rejecting H0 ; and ∆2 for the boundaries that facilitate early stopping for futility by rejecting H1 . Percent survival at Time t This option in East specifies the survival curves for the control and treatment arms using their percentages surviving at Time t. Given this information East will calculate medians, hazard rates, and hazard ratios. Post-hoc power The post-hoc power is an a-posteriori characteristic of the actually adopted sequence of analyses: it is the probability of rejecting the null hypothesis using a testing strategy that corresponds to the analyses performed during the trial, up to and including the final one. 2650 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Post-hoc power chart The post-hoc power is an a-posteriori characteristic of the actually adopted sequence of analyses: when computed after each interim analysis, it is the probability of rejecting the null hypothesis using a testing strategy that corresponds to the analyses performed up to and including the current look plus a hypothetical final analysis. This plot provides a graphical rendition of how the post-hoc power varies as a function of the cumulative information (e.g. sample size, number of failures etc., depending on the endpoint) at this hypothetical last look. Two special cases are worth noting: before the first analysis the post-hoc power curve corresponds to a power curve for a fixed sample study as a function of information rather than of the parameter of interest; after the actual last analysis the post-hoc power reduces to a single number (displayed in the ”Post-Hoc Power” output box of the Interim Monitoring worksheet). Power (1-beta) The power of the study (or one minus beta, where beta is the type-2 error probability) is the probability of terminating the study with the rejection of the null hypothesis (H0 ) when the alternative hypothesis (H1 ) is indeed true. Usual choices of power are 0.9 and 0.8 (corresponding to 10% and 20% type-2 error probability, respectively, also known as Beta). Beta is the type-2 error, the probability of not rejecting H0 when it is in fact false. An underpowered trial is extremely undesirable because it places human subjects at risk with a low probability of reaching a positive scientific conclusion and diverts resources that could be better utilized elsewhere. Power chart This plot provides a graphical rendition of how the power of the study varies as a function of the effect size or non-inferiority margin (e.g. standardized difference, difference in proportions, etc.). p-value, adjusted The method suggested by Fairbanks and Madsen (1982) is applied to derive the overall adjusted p- value at the end of the study allowing for repeated significance testing. Repeated confidence interval The sequence of repeated confidence intervals provided after each look has simultaneous coverage probability of (1 − α)100%. Each interval 2651 <<< Contents X * Index >>> Glossary provides a statistical summary of the information about the parameter of interest allowing for repeated looks at the accumulating data. This quantity can contribute, together with any other relevant information, to the decision to terminate or continue the study. The coverage probability of the procedure is maintained regardless of how the decision to terminate the study is taken. Repeated P-value At the kth analysis, a two-sided repeated P-value for the null hypothesis H0 : δ = δ0 is defined As pk = max(α : δ0 Ik (α)), where Ik (α) is the current (1 − α)-level Repeated Confidence Interval (RCI). In other words, Pk is that value of α for which the kth (1 − α)-level RCI contains the null value, δ0 , as one of its endpoints. The repeated P-value provides protection against the effect due to multiple-looks. Repeated significance test The idea of a ”repeated significance test” at a constant nominal significance level to analyze accumulating data at a number of times over the course of a study was developed by Pocock. Subject entry is divided into K equally sized groups containing m subjects on each treatment, and the data are analyzed after each new group of observations has been observed. Sample size calculator The calculator applies to parallel two-arm randomized designs with normal or binomial endpoints. For such studies it translates information into a sample size when supplied with the value of the nuisance parameter, namely the known and common standard deviation of the observations (if the underlying endpoint follows a normal distribution) or the success rate in the control group (if the underlying endpoint follows a binomial distribution). Significance level (alpha) Alpha (or type-1 error probability), is the probability of terminating the study with the rejection of the null hypothesis (H0 ) when it is actually true. Usual choices of alpha are 0.05 and 0.10 (corresponding to 5% and 10% type-1 error probability, respectively). Spacing of looks 2652 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Two options are available in East to specify the relative spacing of looks. If ”Equal Spacing” is selected, East assumes, at design, that analyses are performed after equal increments of physical resources (e.g. subjects for normal or binomial endpoint, failures for survival) or of statistical information. If ”Unequal Spacing” is selected, the user specifies the timing of the analyses in terms of fractions (in the range 0 to 1) of cumulative information. The actual spacing of analyses adopted during the trial can be different from the one tentatively chosen for design purposes. Spending function See alpha spending function or beta spending function or the next entry. Spending Functions, Published (Pub) These are single-parameter boundary families, the ρ (rho) or γ (gamma). ρ = 1 produces boundaries that resemble the Pocock; ρ = 3 produces boundaries that resemble the more conservative O’Brien-Fleming. When γ is negative its convex spending functions increase in conservatism as γ decreases; when γ is positive its concave spending functions increase in aggressiveness as γ increases. When γ = 0 the type-1 error is spent linearly. When γ = 1 the stopping boundaries resemble the Pocock. Standardized difference When designing a study to compare means of normally distributed observation the expected value of the difference between the means of the two groups being compared divided by the (common and assumed known) standard deviation of the observations is of relevance. This quantity is referred to as the standardized difference. In non-inferiority studies this difference represents the non-inferiority margin (the treatment arm should not be worse than the control arm by more than the non-inferiority margin). It can also be expressed as a function of its individual components (the two means and the common standard deviation), or of the difference in means and the standard deviation. Stopping probabilities The probability that the test statistic will cross a stopping boundary at a given look. These probabilities are different depending on which hypothesis is assumed to hold (for instance under the null, the alternative or an intermediate hypothesis). Study duration 2653 <<< Contents X * Index >>> Glossary In studies with time to event endpoint, the study duration up to and including a given look (actual chronological time of the analysis relative to study start) computed under various hypotheses. Superiority trial A superiority trial aims to determine if the outcome of an experimental treatment is better than the outcome of the standard treatment. It aims to reject the null hypothesis that there is no difference between these two outcomes. Test statistic In any of the Interim Monitoring worksheets and in the Direct Monitoring worksheet the user is requested to input the value of the test statistic observed at the current analysis. This corresponds to the usual deviate, following a standard normal distribution under the null, as provided by statistical analysis packages. See also Nominal Critical Point. Test statistic calculator When at any of the interim analyses the value of the effect size of interest (delta) is known as well as its estimated standard error, the calculator computes the corresponding value of the Wald test statistic. The supplied values of delta and its estimated standard are then used to compute the repeated confidence interval at the given look instead of the design values. Test type The type of the test can be either one- or two-sided. A one-sided test assumes that under the alternative hypothesis the parameter of interest lies in a single direction away from the null hypothesis H0 . A two-sided test assumes that under the alternative hypothesis the parameter of interest lies in either direction away from the null hypothesis H0 , and the test searches in both directions for departures of the test statistic from H0 . Time of looks The time at which the analyses are performed, in terms of the cumulative fraction of the maximum information (in the range 0 to 1). In particular, for Normal and Binomial endpoints the maximum information is given by the maximum accrual. For Survival type of data, it is given by the maximum number of events. 2654 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Traditional study designs Preference for an experimental treatment can be demonstrated in terms of its improved efficacy with respect to control (a superiority trial), its equivalence to the control treatment (an equivalence trial), or its being not much worse than the control treatment (a non-inferiority trial). In an equivalence trial the goal is to establish equivalence between two treatments rather than the superiority in efficacy of one over the other. In a non-inferiority trial, the experimental treatment should be demonstrated not to be inferior by more than a tolerable non-inferiority margin. Type of trial Preference for an experimental treatment can be demonstrated in terms of its improved efficacy with respect to control (“Superiority” trial), its equivalence to the control treatment (“Equivalence” trial), or its being not much worse than the control treatment (“Non-inferiority” trial). In an equivalence trial, the goal is to establish equivalence between two treatments rather than the superiority in efficacy of one over the other. In a non-inferiority trial, the experimental treatment should be demonstrated not to be inferior by more than a tolerable non-inferiority margin. Type-1 error The type-1 error probability is the probability of selecting the alternative hypothesis H1 when the null hypothesis H0 is in fact true. The significance level α (alpha) quantifies the strength of the evidence against the null hypothesis H0 : µ = µ0 . An α = .05 implies that the test of significance would erroneously reject the null hypothesis when in fact it was true only five times in 100 tests (1 time in 20). Commonly used significance levels are: .05, .01 (1 time in 100), .025 (25 times in 1000) or .1 (1 time in 10). Type-2 error The type-2 error probability (β) (beta) is the probability of erroneously accepting the null hypothesis H0 when H1 is in fact true. Commonly used values of (β) are .10 and .20. The power of the test is defined as 1 - (β). It is the probability of correctly rejecting H0 (the null hypothesis) when H1 (the alternative hypothesis) is in fact true. Wang-Tsiatis boundaries The Wang-Tsiatis boundaries permit early stopping to reject H0 . They are 2655 <<< Contents X * Index >>> Glossary used to stop a trial early for efficacy only (1-sided boundaries), safety only (1-sided boundaries), or to stop early either for efficacy or safety (two-sided boundaries). 2656 <<< Contents * Index >>> Y Y.1 On validating the East Software Group Sequential and Adaptive Designs Y.1.1 Y.1.2 Y.1.3 Y.1.4 Y.1.5 Y.1.6 Y.1.7 Y.1.8 East 6.4 Validation East 6.3 Validation East 6.2 Validation East 6.0 and 6.1 Validation East 5.4 Validation East 5.3 Validation East 5 and East 4 Validation East 3 Validation Y.1.1 East 6.4 Validation This section describes the extensive validating procedures carried out on all the features incorporated in East 6.4. East 6.4 will be referred to as East in this subsection. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1-C8 correspond to the following methods: C1 column:Validation using East5.4 - Most of the features which are implemented in East can be validated using the earlier version of East, version 5.Results from such features are compared and validated against East 5 and their consistency is ensured. C2 column:Validation using in-house R codes - We have developed and are using independent R scripts to validate results from East. These R codes, in some cases, can be used to validate the intermediate output quantities whereas in some cases to validate the complete feature. C3 column:Validation using published R packages - Some features in East are partially or completely available in published R packages. The results from such features are compared and validated against the results from these R packages. C4 column:Validation using SAS - Some features in East are partially or completely available in SAS. The results from such features are compared and validated against the results from these SAS procedures. C5 column:Validation using SiZ 2.0 - Most of the features in East which related to Single look design come from SiZ 2.0 version. Results from such features are compared and validated against SiZ 2.0 and their consistency is ensured. SiZ 2.0 is fully validated released software. It has been thoroughly validated against external software like nQuery, PASS, SAS and R as well as with in-house validation programs in R/SAS. C6 column:Using East for Internal Validation and Consistency - All the features in East are validated by applying some internal consistency checks. These checks are generally carried out using different features within East. C7 column:Validation using StatXact10 - Most of the features in East which related to Single look design come from StatXact 11 version. Results from such features are compared and validated against StatXact11. C8 column: Validation using commercial software packages - Features that are available in other commercial packages like nQuery, PASS and SAS have Y.1 Group Sequential and Adaptive Designs – Y.1.1 East 6.4 Validation 2657 <<< Contents Y * Index >>> On validating the East Software been validated against East. N 1 2 3 4 5 6 7 8 9 10 East Feature Design-MCP for Survival Endpoint Design-MEP for Discrete Endpoint Analysis-MEP for Discrete, Continuous Endpoint Analysis-MCP for Survival Endpoint Assurance and Bayesian predictive power for Survival Endpoint Dose Escalation Designs Multi-arm Two-stage Designs based on p-value combination MAMS for Continuous Endpoint Predict Procedures IM using Muller-Schafer Method Y.1.2 C1 – – – – – C2 1 1 1 1 1 C3 – 1 1 1 – C4 – – – – – C5 – – – – – C6 1 1 1 1 1 C7 – – – – – C8 – – – – – – – 1 1 1 1 – – – – 1 1 – – – – – – – 1 1 1 1 – – – – – – – – 1 1 1 – – – – – – East 6.3 Validation This section describes the extensive validating procedures carried out on all the features incorporated in East 6.3. East 6.3 will be referred to as East in this subsection. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1-C8 correspond to the following methods: C1 column:Validation using East5.4 - Most of the features which are implemented in East can be validated using the earlier version of East, version 5.Results from such features are compared and validated against East 5 and their consistency is ensured. C2 column:Validation using in-house R codes - We have developed and are using independent R scripts to validate results from East. These R codes, in some cases, can be used to validate the intermediate output quantities whereas in some cases to validate the complete feature. C3 column:Validation using published R packages - Some features in East are partially or completely available in published R packages. The results from such features are compared and validated against the results from these R packages. C4 column:Validation using SAS - Some features in East are partially or completely available in SAS. The results from such features are compared and validated against the results from these SAS procedures. C5 column:Validation using SiZ 2.0 - Most of the features in East which related to Single look design come from SiZ 2.0 version. Results from such features are compared and validated against SiZ 2.0 and their consistency is 2658 Y.1 Group Sequential and Adaptive Designs – Y.1.2 East 6.3 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 ensured. SiZ 2.0 is fully validated released software. It has been thoroughly validated against external software like nQuery, PASS, SAS and R as well as with in-house validation programs in R/SAS. C6 column:Using East for Internal Validation and Consistency - All the features in East are validated by applying some internal consistency checks. These checks are generally carried out using different features within East. C7 column:Validation using StatXact10 - Most of the features in East which related to Single look design come from StatXact 10.1 version. Results from such features are compared and validated against StatXact10.1. C8 column: Validation using commercial software packages - Features that are available in other commercial packages like nQuery, PASS and SAS have been validated against East. In the table below, the symbol ”1” indicates that the method in that column was used for validation of the feature in corresponding row. The symbol ”-” indicates that the method in that column was not applicable for that feature. Y.1 Group Sequential and Adaptive Designs – Y.1.2 East 6.3 Validation 2659 <<< Contents Y * Index >>> On validating the East Software N 1 1.1 1.2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2660 East Feature Fixed Sample Tests Exact Design Module Exact Analysis Module Group Sequential Exact Probability Computation Exact Adjusted Confidence Interval Exact Conditional Power Simon’s Two Stage Design Dose Escalation Designs Conditional Simulations Site Info Simulations Parallel Gatekeeping for Multiple Endpoints Muller-Schafer for SSR SSR for Ratio of Proportions Predicted Interval Plots Exact Inference Adaptive (BWCI) Exact Inference Adaptive (RCI) Arbitrary Weights CHW Sample Size / Information Calculator C1 C2 C3 C4 C5 C6 C7 C8 1 – 1 – 1 – – – – – – – 1 – – 1 1 1 1 1 – 1 – – 1 – – – – 1 – – 1 1 – – – – – 1 1 1 1 1 – 1 1 – – 1 – – – – – – – – – – – – 1 – 1 1 1 1 – – – – – – – 1 – – – – 1 – – 1 1 – 1 1 1 1 1 1 1 – – – – – – – – – – – – – – – – – – – – – – 1 1 1 1 1 1 1 – – – – – – – – – – – – – – Y.1 Group Sequential and Adaptive Designs – Y.1.2 East 6.3 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Y.1.3 East 6.2 Validation This section describes the extensive validating procedures carried out on all the features incorporated in East 6.2. East 6.3 will be referred to as East in this subsection. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1-C5 correspond to the following methods: C1 column: Validation using East5.4 - Most of the features which are implemented in East can be validated using the earlier version of East, version 5.Results from such features are compared and validated against East 5 and their consistency is ensured. C2 column: Validation using in-house R codes - We have developed and are using independent R scripts to validate results from East. These R codes, in some cases, can be used to validate the intermediate output quantities whereas in some cases to validate the complete feature. C3 column: Validation using published R packages - Some features in East are partially or completely available in published R packages. The results from such features are compared and validated against the results from these R packages. C4 column: Using East for Internal Validation and Consistency - All the features in East are validated by applying some internal consistency checks. These checks are generally carried out using different features within East. C5 column: Validation using commercial software packages - Features that are available in other commercial packages like nQuery, PASS and SAS have been validated against East. In the table below, the symbol ”1” indicates that the method in that column was used for validation of the feature in corresponding row. The symbol ”-” indicates that the method in that column was not applicable for that feature. N 1 2 3 4 5 East Feature Count Data Designs (Poisson / Negative Binomial) Serial Gatekeeping for Multiple Endpoints CI-based Designs Kaplan-Meier Plots CHW / CDL Methods for SSR C1 – C2 1 C3 – C4 – C5 1 – 1 – – – – – 1 1 1 1 1 – – 1 – 1 – 1 – Y.1 Group Sequential and Adaptive Designs – Y.1.3 East 6.2 Validation 2661 <<< Contents Y * Index >>> On validating the East Software Y.1.4 East Architect and East 6.1 Validation This section describes the extensive validating procedures carried out on all the features incorporated in East Architect as well as East 6.1. East Architect and East 6.1 will be referred to as East in this subsection. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1–C6 correspond to the following methods: C1 column:Validation using East5.4 - Most of the features which are implemented in East can be validated using the earlier version of East, version 5. Results from such features are compared and validated against East 5 and their consistency is ensured. C2 column:Validation using in-house R codes - We have developed and are using independent R scripts to validate results from East. These R codes, in some cases, can be used to validate the intermediate output quantities whereas in some cases to validate the complete feature. C3 column:Validation using published R packages - Some features in East are partially or completely available in some of published R packages. The results from such features are compared and validated against the results from these R packages. C4 column:Validation using SAS - Some features in East are partially or completely available in SAS. The results from such features are compared and validated against the results from these SAS procedures. C5 column:Validation using SiZ 2.0 - Most of the features in East which related to Single look design come from SiZ 2.0 version. Results from such features are compared and validated against SiZ 2.0 and their consistency is ensured. SiZ 2.0 is fully validated released software. It has been thoroughly validated against external software like nQuery, PASS, SAS and R as well as with in-house validation programs in R/SAS. C6 column:Using East for Internal Validation and Consistency - All the features in East are validated by applying some internal consistency checks. These checks are generally carried out using different features within East. In the table below, the symbol ”1” indicates that the method in that column was used for validation of the feature in corresponding row. The symbol ”-” indicates that the method in that column was not applicable for that feature. 2662 Y.1 Group Sequential and Adaptive Designs – Y.1.4 East 6.0 and 6.1 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 N 1 2 3 3.1 3.2 3.3 4 4.1 4.2 5 6 7 7.1 7.2 7.3 8 9 10 11 12 13 East Feature Response Lag, Accrual, and Dropouts for Continuous and Discrete Endpoints Predictive Power Fixed Sample Tests Design Module Simulation Module Analysis Module Multi-Arm Tests Design Module Analysis Module Group Sequential Probability Computation Rounded Sample Size Flexibility in Setting up Boundaries Efficacy and Futility Missing Boundaries (Standardized) Treatment Scale Futility Boundary Conditional Power Scale for Futility Boundary Haybittle-Peto (p-value Scale) Boundary Computation Adjusted Confidence Interval (ACI) Conditional Power (CP) East 6.1 Features Stratified Simulations Assurance (Probability of Success) Bayesian Predictive Power C1 – C2 1 C3 – C4 – C5 – C6 1 – 1 – – – 1 – – – – – – – – – – – – 1 – 1 1 1 1 – – 1 – 1 1 1 1 1 – 1 – – – – – – 1 – – 1 1 1 1 1 – 1 1 1 – – – – – 1 1 – 1 – – – 1 1 1 – – – 1 1 1 1 1 – – 1 – – – 1 1 1 0 0 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 Y.1 Group Sequential and Adaptive Designs – Y.1.4 East 6.0 and 6.1 Validation 2663 <<< Contents Y * Index >>> On validating the East Software Y.1.5 East 5.4 Validation This section describes the extensive validating procedures carried out on adaptive features incorporated in East5.4. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1–C4 correspond to the following methods: C1 column:Using East for Internal Validation and Consistency- In case of adaptive simulations, the final outcome is the ’Re-estimated Sample Size’ and the ’Achieved Conditional Power for that sample size. To validate these two numbers we can use intermediate parameters like ’Estimate of Delta’, Standard Error of that estimate, the sample size at the adapt look in East designs. The output from CHW IM like repeated p-value is also verified using the Design level features in East. C2 column: Use of R code - We have developed and are using independent R scripts to validate results from adaptive features like CHW, CDL Simulations and CHW IM. In case of simulations this code works to compute the re-estimated sample size and the power achieved. In case of CHW IM, the R code computes Weighted statistics, the RCI’s, and the repeated p-values. C3 column: Use of Excel Based Tools - We have developed in-house Excel based tools to validate the results obtained from adaptive features. These tools also require information on the adapt look parameters like ’Delta Estimate’, ’Standard Error’ of that estimate. The outcomes validated are the re-estimated sample size and the conditional power achieved. C4 column: Use of Excel Based Tools - Using Excel based tool (Developed and recommended By Dr. Cyrus Mehta) to verify the alpha and Power preservation from adaptive simulations. We can run the simulations under the Null/Alternative hypothesis and verify whether the Type-I Error/Design Power is indeed preserved or not. On running 100000 or more simulations, accuracy is achieved. To verify whether the simulated rejection probability is actually close to the Design Alpha or Power, we use the excel based tool which gives us the confidence of preservation of probabilities. This tool in general can be used to verify whether the observed number in (0,1) is close to the actual number or not. In the table below, the symbol ”1” indicates that the method in that column was used for validation of the feature in corresponding row. The symbol ”-” indicates that the method in that column was not applicable for that feature. All the features in the table below are validated for the two tests under Survival Endpoint: Superiority Trial Two sample Given Accrual Duration and Accrual Rates and Superiority Trial Two sample Given Accrual Duration and Study 2664 Y.1 Group Sequential and Adaptive Designs – Y.1.5 East 5.4 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Duration N 1 2 3 4 East 5.4 Feature CHW Simulations CDL Simulations CHW IM CP Calculator C1 1 1 1 1 C2 1 1 1 1 C3 1 1 – 1 C4 1 1 1 1 Y.1 Group Sequential and Adaptive Designs – Y.1.5 East 5.4 Validation 2665 <<< Contents Y * Index >>> On validating the East Software Y.1.6 East 5.3 Validation This section describes the extensive validating procedures carried out on adaptive features incorporated in East5.3. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1–C5 correspond to the following methods: C1 column: Using East for Internal Validation and Consistency - In case of adaptive simulations, the final outcome consists of ’Re-estimated Sample Size’ and the ’Achieved Conditional Power’ for that sample size. To validate these two numbers, we use intermediate parameters like ’Estimate of Delta’, ’Standard Error’ of that estimate and the sample size at the adapt look. Output quantities like weighted test statistic and repeated p value from CHW IM sheet are also verified using internal validation. C2 column: Use of R code - We have developed independent R scripts to validate results from adaptive features like CHW and CDL Simulations as well as CHW IM sheet. In case of afore mentioned Simulations this code works to compute the re-estimated sample size and the power achieved. In case of CHW IM, it computes Weighted statistics, Repeated Confidence Intervals, and repeated p-values. We have utilized R-packages like ’ldbounds’, ’Adapt’. C3 column: Use of Excel Based Tools - We have developed in-house Excel based tools to validate the results obtained from adaptive simulations. These tools also require information on the adapt look parameters like ’Delta Estimate’, ’Standard Error’ of that estimate. The outcomes validated are the re-estimated sample size and the conditional power achieved. C4 column: Use of ADDPLAN - We have compared results from CHW IM sheet and CP calculator with ADDPLAN. C5 column: Confidence Interval for Probabilities using Excel - We have used in-house Excel based tool recommended by Dr. Cyrus Mehta to verify the Alpha and Power Preservation from adaptive simulations. This tool provides confidence interval for simulated probability. In the table below, the symbol ”1” indicates that the method in that column was used for validation of the feature in corresponding row. The symbol ”-” indicates that the method in that column was not applicable for that feature. 2666 Y.1 Group Sequential and Adaptive Designs – Y.1.6 East 5.3 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 All the features in the table below are validated for Normal Endpoint: Superiority Trial Two sample Difference of Means and Binomial Endpoint: Superiority Trial Two sample Difference of Proportions. Serial No. 1 2 3 4 5 6 7 8 9 East 5.3 Feature CHW Simulations CDL Simulations MS Simulations MS-RCI Estimations MS-SWACI Estimations MS-RCI Estimation Calculator MS-SWACI Estimation Calculator CHW IM CP Calculator C1 1 1 1 1 1 1 1 1 1 C2 1 1 – – – – – 1 – C3 1 1 – – – – – – – Y.1 Group Sequential and Adaptive Designs – Y.1.6 East 5.3 Validation C4 – – – – – – – 1 1 C5 1 1 1 1 1 1 1 1 – 2667 <<< Contents Y * Index >>> On validating the East Software Y.1.7 East 5 and East 4 Validation This manual discusses more than one hundred illustrative trial designs with simulation and interim monitoring. We used these designs to validate the internal and external consistencies of East. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1–C4 correspond to the following comparisons: C1 column: Comparisons of the sample sizes for single look designs obtained from East 5 with the analogous estimates from the nQuery(2005) and Egret Siz (1997) software. For the repeated measures design that is not supported by these software packages, we compared the estimates obtained from East 5 with the results reported by Fitzmaurice, Laird and Ware (2004). C2 column: Comparisons of the design values of significance level and power with the values obtained by simulation in a single look setting. C3 column: Comparisons of the design values of the probabilities of crossing the stopping boundaries, significance level and power with the values obtained in the simulation in a multiple-look setting. C4 column: Comparisons of the design boundary values with the boundary value estimates generated in the internal monitoring (IM) module. In the table, the symbol “1” indicates that the comparison was made for the test and the symbol “-” denotes that a comparable test in other software was not available or the comparison was not applicable (e.g. a check of the boundary crossing probabilities for the East procedures that only support a single look design). 2668 Y.1 Group Sequential and Adaptive Designs – Y.1.7 East 5 and East 4 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 N Setting Test Name C1 C2 C3 C4 1 2 3 4 5 6 Test Type Normal Superiority Superiority Superiority Superiority Superiority Superiority One Sample One Sample One Sample One Sample Two Samples Two Samples 1 1 1 1 1 1 1 1 1 1 1 1 1 1 – – 1 – 1 1 – – 1 – 7 8 9 10 11 Superiority Superiority Superiority Non-inferiority Non-inferiority Regression Regression Regression Two Samples Two Samples 1 1 1 1 1 – – – 1 1 – – – 1 – – – – 1 – 12 13 14 Equivalence Equivalence Equivalence Two Samples Two Samples Two Samples 1 1 1 1 1 – – – – – – – 15 Equivalence Two Samples Single Mean Paired Means t-Test Paired t-Test Difference of Means Difference of Means (t-Test) Single Slope Two Slopes Repeated Measures Difference of Means Difference of Means (t-test) Difference of Means Log-ratio of Means Difference of Means (Crossover) Log-ratio of Means (Crossover) 1 – – – 16 17 18 Binomial Superiority Superiority Superiority One Sample One Sample Two Samples 19 20 21 22 23 24 25 Y.1 Single Proportion 1 1 1 Matched Pairs – 1 1 Difference of 1 1 1 Proportions Superiority Two Samples Ratio of – 1 1 Proportions Superiority Two Samples Odds ratio of – 1 1 Proportions Superiority Two Samples Stratified 2x2 – 1 1 Tables Superiority Two Samples Fisher Exact Test 1 – – Superiority > 2 Samples Trend in K Ordered 1 – – Proportions Superiority Regression Logistic Regression 1 Non-inferiority Two Samples Difference of 1 1 1 Group Sequential and Adaptive Designs – Y.1.7 East 5 and East 4 Validation Proportions 1 1 1 1 1 1 – – 1 1 2669 <<< Contents Y * Index >>> On validating the East Software N 2670 Setting Test Name C1 C2 C3 C4 26 Test Type Binomial Non-inferiority Two Samples – 1 1 1 27 Non-inferiority Two Samples – 1 1 1 28 Non-inferiority Two Samples – 1 1 1 29 Two Samples 1 1 – – 30 31 Equivalence Survival Superiority Superiority Ratio of Proportions (Wald) Ratio of Proportions (Farrington and Manning) Odds Ratio of Proportions Equivalence 1 1 1 1 1 1 1 1 32 Superiority Regression 1 – – 1 33 34 Non-inferiority Non-inferiority Two Samples Two Samples Logrank test Logrank test (Advanced Version) Cox Proportional Hazard Logrank Logrank (Advanced Version) 1 – 1 1 1 1 1 1 35 General Superiority Two Samples Convert Single to Multi look – 1 1 1 36 Information Superiority Two Samples Design and monitor Maximum Information Trials – 1 1 1 37 Nonparametric Superiority Two Samples 1 – – – 38 Superiority Two Samples Wilcoxon, Mann and Whitney Wilcoxon Rank Sum 1 – – – Two Samples Two Samples Y.1 Group Sequential and Adaptive Designs – Y.1.7 East 5 and East 4 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Y.1.8 East 3 Validation The statistical results computed by East 3 have been subjected to rigorous and extensive quality-assurance testing for purposes of validation. A database consisting of a large number of studies has been compiled at Cytel Software Corporation. These studies have been gathered from published articles, from East-2000 software and from East-2000 beta testers. Several additional studies have been constructed by us since the release of East-2000. We have also constructed studies using the University of Wisconsin software package. We have thereby tested the software across a broad range of possible input values. The results were checked by five different methods. 1. Checks against East-2000 and East-DOS. The results in East 3 have been checked against East-2000, which in turn was tested against East-DOS. The East-2000 and East-DOS software were collectively tested extensively over a period of ten years both in-house and by end-users at commercial sites, academic sites and the FDA. 2. Checks against Published Tables. East 3 implements the family of power boundaries proposed by Wang and Tsiatis (1987) and further extended by Pampallona and Tsiatis (1994). Both papers contain extensive tabulations of the constants defining the boundaries and of expected sample numbers for numerous combinations of the various design parameters. East 3 also uses the spending function approach for generating stopping boundaries at the design stage. Tables of boundaries and inflation factors derived from published spending functions are available and have been published by Jennison and Turnbull (2000). We have verified that the numbers in these tables match corresponding numbers generated by East 3. 3. Checks against Simulation. The East 3 simulation module provides a further way to check some properties of the designs proposed by East 3 up to Monte Carlo accuracy. For any given set of boundaries, the several different quantities have been checked against the theoretical operating characteristics of any chosen design, such as type-I and type-II error probabilities, stopping probabilities and average sample number. Specifically we have verified the following through simulation: (a) We have simulated studies with varying values for the effect size, ranging all the way from the null hypothesis up to the alternative hypothesis. In every case we have verified that the theoretical power obtained from the design module of East 3 matches with the power obtained by simulation. (b) We have compared the exit probabilities, look by look, between the simulation results and the theoretical results obtained from the design details module of East 3. The exit probabilities match, up to Monte Carlo accuracy. Y.1 Group Sequential and Adaptive Designs – Y.1.8 East 3 Validation 2671 <<< Contents Y * Index >>> On validating the East Software (c) We have compared the average sample size obtained by simulation with the corresponding average sample size displayed on the design worksheet for H0 , H1 and H1/2 . The results match. 4. Logical Checks. Several logical checks have been implemented where the behavior of East 3 can either be predicted with certainty or where a high level of consistency is expected among varying but related situations. Some examples are given below: (a) East 3 has been extensively tested against published tables and commercial software for fixed-sample size designs. The fixed sample designs in East 3 are special cases of the group sequential designs for which East 3 was primarily developed. (b) We have designed many studies with a variety of spending functions and with both equal and unequal spacings for the interim looks. We have then invoked the interim monitoring module in East 3 and implemented the monitoring schedule exactly as prescribed in the design stage. We have thereby verified through two independent computation procedures that the error spent, and stopping boundaries produced at the interim monitoring stage are identical to the corresponding values at the design stage. (c) We have documented (in Appendix C) that the stopping boundaries used at the interim monitoring stage of a Wang-Tsiatis or Pampallona-Tsiatis design are derived from inverting ten-look, equally spaced stopping boundaries, generated at the design stage. The design and interim monitoring output have therefore been compared for 10-look designs that were actually monitored with 10 equally spaced looks. The results from these two independent methods of obtaining the output match. (d) In the interim monitoring module, before the first look is performed, the conditional power chart corresponds to the usual power curve for fixed sample designs. This serves to validate that the power specified in the design module matches the initial estimate of conditional power. (e) In the interim monitoring module, the suggested optimal look position before any data have been entered into the worksheet must correspond to the sample size requirements of a fixed sample design. We have verified that this theoretical requirement is satisfied. (f) The General module can set up and allow monitoring of a group sequential design on the basis of the sample size requirement of the corresponding fixed sample design. Therefore, for any arbitrary group sequential design set up and monitored in either of the Normal, Binomial or Survival modules it is possible to replicate virtually all the output with the General module given the sample size requirement of its fixed sample counterpart. 2672 Y.1 Group Sequential and Adaptive Designs – Y.1.8 East 3 Validation <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 (g) A number of actual clinical applications published in the literature were replicated in East 3. The East 3 results were consistent with the published results. Many of these applications were used as case studies in the earlier East-2000 software. (h) The exit probabilities under either H0 , H1 or H1/2 are displayed in the design details worksheet. We have verified that the sum of these exit probabilities, for any of the above hypotheses, is 1. (i) We have verified that the expected sample sizes under H0 , H1 or H1/2 , as displayed on the design worksheet, match with the corresponding expected samples sizes computed directly from the exit probabilities and cumulative accruals, displayed as design details in East 3. (j) We have verified that the cumulative alpha spent matches with the cumulative exit probabilities under H0 from the design details portion of East 3. (k) We have verified that for studies with H0 -only boundaries, the cumulative alpha spent at any intermediate look matches the cumulative exit probability under H0 , up to that intermediate look. (l) We have verified that for 1-sided studies with H1 -only boundaries, the cumulative beta spent at any intermediate look matches the cumulative exit probability under H1 , up to that intermediate look. (m) We have verified that there is internal consistency between the final adjusted confidence intervals, computed by the Tsiatis, Rosner and Mehta (1989) stage-wise method, and the final adjusted p-value. That is, the final adjusted confidence interval excludes the parameter of interest if and only if the final stopping boundary is crossed, and the final adjusted p-value is less than alpha. (n) We have verified that there is internal consistency between the repeated confidence intervals of Jennison and Turnbull (1998) and the value of the final test statistic. That is, one extreme of the repeated confidence interval will coincide with zero for superiority trials (or with the non-inferiority margin for non-inferiority trials) if and only if the observed test statistic falls on a boundary value. (o) We have verified that there is internal consistency between the final adjusted p-value and the final cumulative alpha that was spent when the test statistic coincides with the stopping boundary. These two values are computed independently but logically they have to be equal. (p) We have verified that the maximum information obtained from the information based design module of East 3 corresponds to the maximum Y.1 Group Sequential and Adaptive Designs – Y.1.8 East 3 Validation 2673 <<< Contents Y * Index >>> On validating the East Software sample size obtained from the normal or binomial design modules, for studies in which the effect size, power, type-1 error, stopping boundaries and spacing of looks is kept the same. 5. Checks against Public Domain Software. Public domain Fortran routines developed at the University of Wisconsin (see Reboussin et. al., 2002) can be freely downloaded from http:www.landemets.com. East 3 replicated the results produced by this software for adjusted p-values, confidence intervals and unbiased estimators following sequential monitoring. The stopping boundaries are evaluated differently in the two procedures and result in small differences. A detailed explanation for these differences is provided in Appendix F. Y.2 Fixed-Sample Designs (FSD) Y.2.1 Details The statistical results computed by FSD have been subjected to rigorous and extensive quality-assurance testing for purposes of validation. A summary table displaying the methods used for each statistical procedure is given below. Each row of the table corresponds to a statistical procedure and the columns C1-C6 correspond to comparison of FSD result with results using other software as indicated below. C1 Column: Comparison with nQuery 7.0. C2 Column: Comparison with SAS 9.1. C3 Column: Comparison with independent developed R programs and SAS macros. C4 Column: Comparison with PASS 2008. C5 Column: Comparison with StatXact8. C6 Column: Comparison with East 5.2. In the following tables, ”1” indicates that the comparison was made for the test and results from FSD were comparable to the respective software. ”2” indicates that the comparison was made but the results did not match for reasons indicated at the bottom of the table. ”-” denotes that a comparable test in other software was not available or the comparison was not applicable. 2674 Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Module : Design Sr. Test Name 1 Continuous: One Mean Single Mean: Z Test Single Mean: t Test Difference of Means for Paired Data: Superiority: Z Test Difference of Means for Paired Data: Superiority: t Test Difference of Means for Paired Data: NonInferiority: Z Test Difference of Means for Paired Data: NonInferiority: t Test Difference of Means for Paired Data: Equivalence: t Test Ratio of Means for Paired Data: Superiority: Z Test Ratio of Means for Paired Data: Superiority: t Test Ratio of Means for Paired Data: NonInferiority: Z Test Ratio of Means for Paired Data: NonInferiority: t Test Ratio of Means for Paired Data: Equivalence: t Test Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details C1 C2 C3 C4 C5 C6 1 - 1 - 1 1 1 - - 1 1 1 - 1 - - - - - 1 - - - 1 - 1 - - - - - 1 - - - - - 1 - - - - - 1 - - - - - 1 - - - - - 1 - - - - - 1 - - - 2675 <<< Contents Y * Index >>> On validating the East Software Module : Design Sr. Test Name 2 2676 Continuous: Two Means Difference of Means for Independent Data: Superiority: Z Test Difference of Means for Independent Data: Superiority: t Test Difference of Means for Independent Data: Non-Inferiority: Z Test Difference of Means for Independent Data: Non-Inferiority: t Test Difference of Means for Independent Data: Equivalence: t Test Ratio of Means for Independent Data: Superiority: Z Test Ratio of Means for Independent Data: Superiority: t Test Ratio of Means for Independent Data: NonInferiority: Z Test Ratio of Means for Independent Data: NonInferiority: t Test Ratio of Means for Independent Data: Equivalence: t Test Wilcoxon Mann Whitney Test for Independent Data Difference of Means for Crossover Data: Superiority: t Test Difference of Means for Crossover Data: Non-Inferiority: t Test Difference of Means for Crossover Data: Equivalence: t Test Ratio of Means for Crossover Data: Superiority: t Test Ratio of Means for Crossover Data: NonInferiority: t Test Ratio of Means for Crossover Data: Equivalence: t Test Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details C1 C2 C3 C4 C5 C6 - - 1 - - 1 1 1 1 - - - - - 1 - - 1 1 1 1 - - - - - 1 - - - - - 1 - - - 1 - 1 - - - - - 1 - - - - - 1 - - - 1 - 1 - - - 1 - 1 - - 1 1 - 1 - - - - - 1 - - - 1 - 1 - - - - - 1 - - - - - 1 - - - 1 - 1 - - - <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details 2677 <<< Contents Y * Index >>> On validating the East Software Module : Design Sr. Test Name 3 4 5 6 2678 Continuous: Many Means One Way ANOVA One Way Contrast One Way Repeated Measures (Constant Correlation) ANOVA One Way Repeated Measures Contrast (Constant Correlation) Two Way ANOVA Continuous: Regression Linear Regression: Single Slope Linear Regression for Comparing Two Slopes Repeated Measures for Comparing Two Slopes Discrete: Single Proportion Single Proportion (Asymptotic) Single Proportion (Exact) McNemars Test for Matched Pairs(*) Discrete: Two Proportion Difference of Proportions: Superiority Difference of Proportions: Non-Inferiority Difference of Proportions: Equivalence Ratio of Proportions: Superiority Ratio of Proportions: Non-Inferiority (Wald test) Ratio of Proportions: Non-Inferiority (Score test) Odds Ratio of Proportions: Superiority(**) Odds Ratio of Proportions: Non-Inferiority Common Odds Ratio for Stratified 2x2 Table Fisher Exact Test Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details C1 C2 C3 C4 C5 C6 1 1 1 - 1 1 1 - - - 1 - 1 - - - 1 - 1 - - - 1 1 - - 1 1 1 - - - 1 1 2 - 1 1 1 - - 1 1 1 1 1 - - 1 1 1 1 1 - - 1 1 1 1 1 - - 1 1 - - 2 1 1 - 1 1 1 - - 1 1 1 - <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Module : Design Sr. Test Name 7 8 9 Discrete: Many Proportion Trend in R Ordered Proportions Chi-square Test for Rx2 Table Chi-square Test of Specified Proportions in C Categories Two-Group Chi-square Test Comparing Proportions in C Categories Chi-square Test of Comparing Proportions in RXC Table Wilcoxon Rank Sum Test for Ordered Categorical Data Discrete: Regression Logistic Regression with Single Normal Covariate Logistic Regression with Single Normal Covariate Adjusted for other Covariates Discrete: Agreement Cohen’s Kappa(***) Cohen’s Kappa (C Ratings) 10 Events: Survival Logrank Test: Superiority Logrank Test: Non-Inferiority C1 C2 C3 C4 C5 C6 1 1 1 - 1 1 1 - - - 1 - 1 - - - 1 - 1 - - - 1 - - - - 1 - - 1 1 - - - - 1 1 - - 2 - - 1 1 - - - - - - - - 1 1 Note (* )The results for McNemar’s Test for Matched Pairs from FSD do not match with those from nQuery as FSD uses Normal approximation while nQuery uses the Chi-square test. (**) The formulation of the Odds Ratio of Proportions: Superiority Test is different in FSD and nQuery which results in the mismatch between their results. (***) There is a difference in the results from FSD and nQuery for the Cohen’s Kappa Test due to the difference in the techniques followed. Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details 2679 <<< Contents Y * Index >>> On validating the East Software Module : Analysis Sr. Test Name 1 2680 Continuous: One Mean Single Mean: Z Test Single Mean: t Test Difference of Means for Paired Data: Superiority: Z Test Difference of Means for Paired Data: Superiority: t Test Difference of Means for Paired Data: NonInferiority: Z Test Difference of Means for Paired Data: NonInferiority: t Test Difference of Means for Paired Data: Equivalence: t Test Ratio of Means for Paired Data: Superiority: Z Test Ratio of Means for Paired Data: Superiority: t Test Ratio of Means for Paired Data: NonInferiority: Z Test Ratio of Means for Paired Data: NonInferiority: t Test Ratio of Means for Paired Data: Equivalence: t Test Wilcoxon Signed Rank Test Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details C1 C2 C3 C4 C5 C6 - 1 - 1 1 - - - - 1 - - - - - - 1 - - - - 1 - - - - - 1 - - - - - - 1 - - - - 1 - - - - - - 1 - - - - 1 - - - - - 1 - - - - - 1 1 - - - <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Module : Analysis Sr. Test Name 2 Continuous: Two Means Diff of Means for Independent Data: Superiority: Z Diff of Means for Independent Data: Superiority: t Diff of Means for Independent Data: NI: Z Diff of Means for Independent Data: NI: t Diff of Means for Independent Data: Equivalence: t Ratio of Means for Independent Data: Superiority: Z Ratio of Means for Independent Data: Superiority: t Ratio of Means for Independent Data: NI: Z Ratio of Means for Independent Data: NI: t Ratio of Means for Independent Data: Equivalence: t Wilcoxon Mann Whitney Test for Independent Data Diff of Means for Crossover Data: Superiority: t Diff of Means for Crossover Data: NI: t Diff of Means for Crossover Data: Equivalence: t Ratio of Means for Crossover Data: Superiority: t Ratio of Means for Crossover Data: NI: t Ratio of Means for Crossover Data: Equivalence: t Wilcoxon Mann Whitney Test: 2x2 Crossover Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details C1 C2 C3 C4 C5 C6 - - 1 - - - - 1 - - - - - 1 1 1 - - - - - - 1 - - - - 1 - - - - - 1 1 1 - - - - - 1 1 - - - - 1 - - - - - 1 1 - - - - - 1 - - - - - 1 1 - - - - - 1 1 - - - 2681 <<< Contents Y * Index >>> On validating the East Software Module : Analysis Sr. Test Name 3 4 5 6 2682 Continuous: Many Means One way ANOVA One Way Repeated Measures (Constant Correlation) ANOVA Two Way ANOVA Continuous: Regression Multiple Linear Regression Repeated Regression Linear Mixed Effects Model: Difference of Means (crossover data) Linear Mixed Effects Model: Ratio of Means (crossover data) Discrete: Single Proportion Single Proportion (Asymptotic) Single Proportion (Exact) McNemars Test for Matched Pairs Discrete: Two Proportion Difference of Proportions: Superiority Difference of Proportions: Non-Inferiority (Wald) Difference of Proportions: Non-Inferiority (Score) Difference of Proportions: Equivalence Ratio of Proportions: Superiority Ratio of Proportions: Non-Inf (Wald) Ratio of Proportions: Non-Inf (Score) Odds Ratio of Proportions: Superiority Odds Ratio of Proportions: Non-Inf (Wald) Odds Ratio of Proportions: Non-Inf (Score) Common Odds Ratio for Stratified 2x2 Table Fisher Exact Test Y.2 Fixed-Sample Designs (FSD) – Y.2.1 Details C1 C2 C3 C4 C5 C6 - 1 1 1 1 - - - - 1 1 - - - - 1 1 - - 1 - - - 1 - - - - - 1 - - - 1 1 - - - 1 - 1 - - - - - - 1 - - - 1 1 - 1 - 1 1 1 1 1 1 - <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Module : Analysis Sr. Test Name 7 8 9 C1 C2 C3 C4 C5 C6 - - - - 1 1 1 - - - - - 1 - - - - - 1 - - - - - 1 - Discrete: Regression Logistic Regression Probit Regression Clog Log Regression - - - - 1 1 1 - Discrete: Agreement Cohen’s Kappa - - - - 1 - - 1 1 1 1 - - - Discrete: Many Proportion Trend in R Ordered Proportions Chi-square Test for Rx2 Table Chi-square Test of Specified Proportions in C Categories Two-Group Chi-square Test Comparing Proportions in C Categories Chi-square Test of Comparing Proportions in RXC Table Wilcoxon Rank Sum Test for Ordered Categorical Data 10 Events: Survival Logrank Test: Superiority Logrank Test: Non-Inferiority Module: Simulation Results of Simulations in FSD are validated by checking the internal consistency. For example, the estimated probability of rejection from Simulations was compared with the analytical result obtained from FSD design procedure. Module: Data Explorer The Data Explorer tests’ outputs have been compared with the corresponding Cytel Studio 8 results. Y.2.2 FSD MC Procedures The Multiple Comparison procedures implemented in FSD MCP have been validated extensively. Various methods were employed for the statistical validation of these Y.2 Fixed-Sample Designs (FSD) – Y.2.2 FSD MC Procedures 2683 <<< Contents Y * Index >>> On validating the East Software procedures. The following summary table states the methods used for validating each of the Multiple Comparison Procedures. Each row of the table corresponds to a procedure and the columns C1-C4 correspond to the validation method used as described below: C1 Column: C2 Column: C3 Column: C4 Column: macros Comparison with SAS 9.1 Comparison with R 2.12.1 (Packages used: ’multxpert’, ’mutoss’) Comparison with PASS 2005 Comparison with independently developed (in-house) R/SAS In the following tables, ’1’ indicates that the comparison was made for the test and results from FSD MCP were comparable to the respective software; ’2’ indicates that the comparison was made but the results either matched partially or did not match for reasons indicated at the bottom of the table; ’-’ denotes that a comparable test in other software was not available or the comparison was not applicable. Table Y.1: Module: Design Sr.# MCP C1 C2 C3 C4 1 2 3 4 5 6 7 8 9 10 11 Dunnett’s single step (*) Dunnett’s step down Dunnett’s step up Bonferroni Sidak Weighted Bonferroni Holm’s step down Hochberg’s step up Hommel’s step up Fixed sequence Fallback 2 - - 2 - 1 1 1 1 1 1 1 1 1 1 1 Note: (*) The critical value for Dunnett’s single step was available with SAS and hence validated with it. This procedure was also compared with PASS. However PASS provides for 2-sided test and FSD MCP has 1-sided test. Hence the results were comparable in case of scenarios where the treatment means were either all greater than or all less than the control mean. Note that these tests are simulation based and hence cannot be matched exactly with PASS. 2684 Y.2 Fixed-Sample Designs (FSD) – Y.2.2 FSD MC Procedures <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Table Y.2: Module: Analysis Sr.# MCP C1 C2 C3 C4 1 2 3 4 5 6 7 8 9 10 11 Dunnett’s single step (**) Dunnett’s step down Dunnett’s step up Bonferroni Sidak Weighted Bonferroni Holm’s step down Hochberg’s step up Hommel’s step up Fixed sequence Fallback 2 1 1 1 1 1 - 1 1 1 1 1 1 1 1 1 1 1 - 1 1 1 - (**) The critical value and Simultaneous CI was available with SAS and hence validated with it. Y.2 Fixed-Sample Designs (FSD) 2685 <<< Contents * Index >>> Z List of East Beta Testers East 6.3 Dey Jyotirmoy Abbvie Wang, Xin Abbvie Dunbar, Martin Abbvie Zeng, Jiewei Abbvie Munasinghe, Wijith P Abbvie Yodit,Seifu Allergan Matcham, James Astrazeneca Su, Hong-Lin Astrazeneca Zhang, Charlie Biomarin Wu, Xiaoling Celgene Liu, Kejian Celgene Isaacson, Jeff Clovis Wang, Qiang Daiichi Wang, Yibin Daiichi Chen, Shuquan Daiichi Lee, James Daiichi Bekele, Neby Gilead 2686 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Li, Xiaoming Gilead Zhang, Grace GSK Anderson, Keaven Merck Gause, Christine Merck Tsai, Kuenhi Merck Huang, Xiaobi (Shelby) Merck Xu, Jialin Merck Mehrotra, Devan Merck Lu, Lin Nektar Goldwasser, Meredith Novartis Holmgren, Eric Oncomed Ro, Sunhee Onyx Liao, Olivia Onyx Perevozskaya, Inna Pfizer Alun Bedding Roche-Genentech Lin, Jianchang Takeda Liu, Patrick Takeda Pickard, Mike Takeda Wang, Ling Takeda Liu, Yi Takeda 2687 <<< Contents Z * Index >>> List of East Beta Testers East Architect Keaven Anderson Merck Loic Darchy Sanofi Yahya Daoud Baylor Health Bing Gao Amgen Brenda Gaydos Eli Lilly Sally Hollis Astrazeneca Xin Huang Pfizer Inc. Chris Jennison University of Bath Sheela Kolluri Pfizer Inc. Chunming Mark Li Pfizer Inc. Jianxin Lin Merck Xun Lin Pfizer Inc. Jiajun Liu Merck John Loewy qPharmetra Richard Manski Abbott Laboratories Joel Miller Eli Lilly May Mo Amgen 2688 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Yili Pritchett Abbott Laboratories Bill Prucka Eli Lilly Natasa Rajicic Pfizer Inc. Brad Robertson Eli Lilly Supriya Satwah Unilever Yue Shentu Merck Sam Suzuki Amgen Enayet Talukder Pfizer Inc. Jie Tang Pfizer Inc. Qi Tang Abbott Laboratories Bruce Turnbull Cornell University Xuan Wang Baylor Health Jim Ware Harvard University Bin Yao Amgen Tianhui Zhou Pfizer Inc. 2689 <<< Contents Z * Index >>> List of East Beta Testers East 4 Marilyn Agin Pfizer Robert Chew Pfizer Loic Darchy Sanofi-Aventis Andrew Kramar CRLC Val d’Aurelle Steve Lagakos Harvard School of Public Health Xiaoming Li Merck Devan Mehrotra Merck Mike Smith Pfizer Thomas Stiger Pfizer Chau Thach Merck 2690 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 East 3 Dan Anbar Millenium Biostatistics Keaven Anderson Centocor Linda Christie Dana-Farber Cancer Institute George Cotsonis Emory University Loic Darchy Sanofi-Synthelabo Dave DeMets University of Wisconsin Brenda Gaydos Eli Lilly and Company Vicki Hertzberg Emory University Joan Hilton UC San Francisco Chris Jennison University of Bath Kyungmann Kim University of Wisconsin Andrew Kramar CRLC Val d’Aurelle Peter Lachenbruch FDA CBER Steve Lagakos Harvard School of Public Health Robert Lagos New England Research Unit Elizabeth Ludington Statistics Collaborative Mike Lynn Emory University Young Park Wyeth 2691 <<< Contents Z * Index >>> List of East Beta Testers Heather Ribaudo Harvard School of Public Health Wasima Rida FDA Center for Biologics Evaluation & Research Larry Roi Aventis Pharmaceuticals Roy Tamura Eli Lilly and Company Butch Tsiatis North Carolina State University Bruce Turnbull Cornell University Sue-Jane Wang FDA –Center for Drug Evaluation Research Wendy Wilson Aventis Pharmaceuticals Ru-Fang Yeh UC San Francisco Boguang Zhen FDA/CBER/OBE/DB 2692 <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 East-2000 Mirza Ali Otsuka America Pharmaceuticals Dan Anbar Millenium Biostatistics Nupun Andhivarothai TAP Holdings Inc. Tad Archambault Virtu Stat Ltd. Peter Armitage University of Oxford Juergen Berger University of Hamburg Harry Bushar FDA CDRH David DeMets University of Wisconsin Richard Hellmund Zeneca Pharmaceuticals Jay Herson Applied Logic Associates Irving Hwang Irving Consulting Group Allen Izu Chiron Corporation Chris Jennison University of Bath Kyungmann Kim University of Wisconsin Peter Lachenbruch FDA CBER Steve Lagakos Harvard School of Public Health Gordon Lan Pfizer Inc. Gracie Lieberman Genentech Inc. 2693 <<< Contents Z * Index >>> List of East Beta Testers Scott Maxwell University of Notre Dame Larry Muenz Independent Consultant Theophile Niyonsenga University of Sherbrooke Abdul Sankoh Genetics Institute Greg Stoddard University of Utah Judy Sy Genentech Inc. Peter Thall University of Texas Bruce Turnbull Cornell University Duolao Wang London School of Hygiene and Tropical Medicine Richard Wu RPR Pharmaceuticals Peter Zhang Otsuka America Pharmaceuticals Huaqing Zhao The Children’s Hospital of Philadelphia 2694 <<< Contents * Index >>> References Abad-Santos F et al. (2005). Assessment of sex differences in pharmacokinetics and pharmacodynamics of almodipine in a bioequivalence study. Pharmacologicial Research, 51, 445-452. Agresti A (2002). Categorical Data Analysis. (2nd Ed). John Wiley & Sons, New York. Agresti A, Min Y. (2001). On small-sample confidence intervals for parameters in discrete distributions. Biometrics 57: 963-971. Andersen EB (1990).The Statistical Analysis of Categorical Data. Springer-Verlag, Berlin-Heidelberg. Anderson K (2002). Evaluating sponsor responsibilities for interim analysis with DMC’s. Presented at the Clinical Trials Data Monitoring Committees meeting, Philadelphia Barnett International Conference Group, Philadelphia. Anderson S, Hauck WW (1990). Consideration of individual bioequivalence. J. Pharmacokin. Biopharm, 18, 259-273. Andrews DF and Herzberg AM (1985). Data. Springer-Verlag, New York. Armitage P (1955). Test for linear trend in proportions and frequencies. Biometrics, 11: 375-386 Armitage P (1957). Restricted sequential procedures. Biometrika, 44, 9-56. Armitage P (1975). Sequential Medical Trials. Blackwell Scientific Publications, Oxford. Armitage P, McPherson CK and Rowe BC (1969). Repeated significance tests on accumulating data. J. R. Statist. Soc. A, 132, 232-44. Arvin AM, Kushner JH, Feldman S, et al. (1982). Human leukocyte interferon for the treatment of varicella in children with cancer. New England Journal of References 2695 <<< Contents * Index >>> References Medicine 306:761-765. Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman. An empirical distribution function for sampling with incomplete information (1955). Ann. Math. Statistics;26:641-647. Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical Inference Under Order Restrictions. New York: John Wiley; 1972. Barnard GA (1945). A new test for 2 × 2 tables. Nature 156:177. Belsey D, Kuh E and Welsch R (1980). Regression diagnostics: Identifying influential data and sources of collinearity, Wiley New York. Benjamini Y, Hochberg Y (1997). Multiple hypothesis testing with weights. Scandinavian Journal of Statistics, 24, 407-418. Berger RL, Boos DD (1994). P values maximized over a confidence set for the nuisance parameter. Journal of the American Statistical Association 89:1012-1016. Beta Blocker Heart Attack Trial. (1981). Beta Blocker Heart Attack Trial” Design features. Controlled Clinical Trials, 2, 275-85. Beta-Blocker Heart Attack Trial (1982). A randomized trial of propranolol in patients with acute myocardial infarction. JAMA, 247, 1707-14. Bickel PJ, Klaasen CAJ, Ritov Y and Wellner JA (1993). Efficient and adaptive estimation for semiparametric models. John Hopkins University press, Baltimore. Block DA, Kraemer HC (1989). 2x2 kappa coefficients: measures of agreement or association. Biometrics. 45: 269-287 Blyth C, Still H (1983). Binomial confidence intervals. Journal of American Statistical Association, 78:108-116. Bofinger E (1987). Step-down procedures for comparison with a control. Australian Journal of Statistics, 29, 348-364. Bonferroni CE (1935). Il calcolo delle assicurazioni su gruppidi teste. In Studi in onore 2696 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 del Professore Salvatore Ortu Carboni. Rome, Italy, 13-60. Bonferroni CE (1936). Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze 8, 3-62. Bowerman B, O’Connel R, Dickey D (1986). Linear Statistical Models, an Applied Approach. Duxbury Press, Belmont, California. Brannath W, Posch M, and Bauer P (2002). Recursive Combination Tests. JASA, 97, 236-244. Brannath W, Mehta CR, Posch M (2009). Exact Confidence Bounds Following Adaptive Group Sequential Tests. Biometrics, 65(2), 539-546. Breslow NE, Day NE (1980). The Analysis of Case-Control Studies. IARC Scientific Publications No. 32 Lyon, France. Breslow NE, Day NE (1987). The Design and Analysis of Cohort Studies. IARC Scientific Publication N0.82, Lyon, France. Bristol DR (1993a). Probabilities and sample sizes for the two one-sided tests procedure, Communications in Statistics - Theory and Methods, A22(7), 1953-1961. Bristol DR (1993b). Planning Survival Studies To Compare A Treatment To An Active Control. Journal of Biopharmaceutical Statistics, 3(2), 153-158. Burgess IF, Brown CM, and Lee PN (2005). Treatment of head louse infestation with 4% dimeticone lotion: randomised controlled equivalence trial. BMJ 330:1423. Cantor AB (1996). Sample size calculation for Cohen’s kappa. Psychological Methods. 1(2): 150-153. Casagrande JT, Pike MC, and Smith PG (1978). An improved approximate formula for comparing two binomial distributions. Biometrics, 34, 483-86. Casella G (1986). Refining binomial confidence intervals. Canadian Journal of Statistics 14:113-129. References 2697 <<< Contents * Index >>> References Chambers J, Cleveland W, Kleiner B, Tukey P. (1983). Graphical Methods for Data Analysis. Wadsworth. Chan ISF (1998). Exact tests of equivalence and efficacy with a non-zero lower bound for comparative studies. Statistics in Medicine 17:1403-1413. Chan ISF, Zhang Z (1999). Test based exact confidence intervals for the difference of two binomial proportions. Biometrics 55:1201-1209 Chen JYH, DeMets DL, Lan KKG.(2004). Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine, 23(7), 1023-1038. Chernick MR, Liu CY (2002). The saw-toothed behavior of power versus sample size and software solutions: single binomial proportion using exact methods. The American Statistician, 56: 149-155. Clopper CJ, Pearson E. (1934). The use of confidence or fiducial limits illustrated in the case of binomial. Biometrika 26:404-413. Chow SC and Liu JP (1992). Design and Analysis of Bioavailability and Bioequivalence Studies. Marcel Dekker, New York. Chow S, Shao J, Wang H (2003). Sample Size Calculations in Clinical Research. Taylor and Francis, New York. Cleveland, W (1993). Visualizing Data. Hobart Press. Cleveland, W (1985). Elements of Graphing Data. Wadsworth. Cochran WG, Cox GM (1957). Experimental Designs. Second Edition, New York: John Wiley & Sons, Inc. Coe PR, Tamhane AC (1993). Exact Repeated Confidence Intervals for Bernoulli Parameters in a Group Sequential Clinical Trial. Controlled Clinical Trials 14, 19-29. Cohen J (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20: 37-46. Cole JW, Grizzle JE (1966). Applications of Multivariate Analysis of Variance to 2698 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Repeated Measures Experiments. Biometrics, 22, 810:828. Collett D (1994). Modelling Survival Data in Medical Research. Chapman & Hall, London. Collett, D. (2002). Modeling Binary Data, 2nd ed. Boca Raton, FL: CRC Press. Collett, D and Jemain, AA (1985). Residuals, outliers and influential observations in regression analysis. Sains Malaysiana, 14, 493-511 Conover WJ (1980). Practical Nonparametric Statistics, 2nd edition. John Wiley & Sons, New York. Cook D, Weisberg S (1982). Residuals and Influence in Regression. Chapman and Hall, London. Cook RD (1979). Influential observations in linear regression. JASA, 74, 169:174. Cook TD, DeMets DL. (2008). Introduction to statistical methods for clinical trials. Chapman and Hall. 7: 296. Cook RD and Weisberg S (1982). Residuals and Influence in Regression. Chapman & Hall, London. Corcoran C D, Mehta CR and Senchaudhuri P (2000) : Power Comparisons for Tests of Trend in Dose Response Studies. Statistics in Medicine, 19, 3037-3050. Cox DR and Snell EJ (1989). Analysis of Binary Data. 2nd Edition. Chapman and Hall, London. CRASH Trial Collaborators (2004). Effect of intravenous corticosteroids on death within 14 days in 10008 adults with clinically significant head injury. Lancet, 364, 1321-28. Crowder M and Hand D (1990). Analysis of repeated measures, Chapman & Hall/CRC. Cui L, Hung HMJ, and Wang S (1999). Modification of sample size in group sequential clinical trials. Biometrics, 55, 853-857. Davidson MH et al (1999). Weight control and risk factor reduction in obese subjects References 2699 <<< Contents * Index >>> References treated for 2 years with Orlisat (1999). JAMA, 281, 235-42. DeMets DL and Gail MH (1985). Use of logrank tests and group sequential methods at fixed calendar times. Biometrics, 41, 1039-44. DeMets DL, Hardy R, Friedman LM and Lan KKG (1984). Statistical aspects of early termination in the Beta-Blocker Heart Attack Trial. Controlled Clinical Trials, 5, 362-72. DeMets DL and Lan KKG (1995). The alpha spending function approach to interim data analyses. In: Recent advances in clinical trial design and analysis, Thall PF Ed. Kluwer Academic Publishers, Boston. DeMets DL and Ware JH (1980). Group sequential methods for clinical trials with a one-sided hypothesis. Biometrika, 67, 651-60. Diggle PJ (1988). An Approach to the Analysis of Repeated Measurements. Biometrics, 44, 959: 971. Diletti, E., Hauschke, D. and Steinijans, VW (1991). Sample Size Determination for Bioequivalence Assessment by Means of Confidence Intervals. International Journal of Clinical Pharmacology, Therapy and Toxicology, 29, 1, 1-8. Dixon WJ, Massey FJ (1983). Introduction to Statistical Analysis. fourth Edition, McGraw-Hill, 14. Dmitrienko A, Tamhane AC, Bretz F (2010). Multiple Testing Problems in Pharmaceutical Statistics. Chapman & Hall. Draper NR, Smith H (1966). Applied Regression Analysis. New York: John Wiley & Sons, Inc. Draper CC, Voller A, Carpenter RG (1972). The Epidemiologic Interpretation of Serologic Data in Malaria. American Journal of Tropical Medicine and Hygiene, 21, 696-703. Duffy SW (1984). Asymptotic and exact power for the McNemar test and its analogue with R controls per case. Biometrics 40:1005-1015. Dunnett CW (1980). Pairwise Multiple Comparisons in the Homogeneous Variance, Unequal Sample Size Case. Journal of the American Statistical Association, 2700 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 75, 789-795. Dunnett CW (1985). A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association, 50, 1096-1121. Dunnett CW and Gent M (1977). Significance testing to establish equivalence between treatments, with special reference to data in the form of 2 x 2 tables. Biometrics, 33, 593-602. Dunnett CW, Tamhane AC (1991). Step-down multiple tests for comparing treatments with a control in unbalanced one-way layouts. Statistics in Medicine, 10,939-947. Dunnett CW, Tamhane AC (1992). A step-up multiple test procedure. Journal of the American Statistical Association, 87, 162-170 Dunnett CW, Tamhane AC (1992). Comparisons between a new drup and active and placebo controls in efficacy clinical trial. Statistics in Medicine, 11, 1057-1063. Dunnett CW, Tamhane AC (1995). Step-up multiple testing of parameters with unequally correlated estimates. Biometrics, 51, 217-227. Dupont, WD. and Plummer, WD., Jr. (1998). Power and Sample Size Calculations for Studies Involving Linear Regression. Controlled Clinical Trials, 19, 589-601. Du Toit, Steyn, Stumpf (1986). Graphical Exploratory Data Analysis. Springer-Verlag. Egret Siz (1997). Sample size and power for nonlinear regression models. Version 1. Reference manual. Cytel Software Corporation, Cambridge, MA. Elashoff, JD. (2005) nQuery Advisor Version 6.0. Stattistical Solution Ltd., Los-Angeles,CA. Everitt BS (1995). The Analysis of Repeated Measures: A Practical Review with Examples. The Statistician, 44, 113:135. Facey KM (1992). A sequential procedure for a Phase II efficacy trial in hypercholesterolemia. Controlled Clinical Trials, 13, 122-133. References 2701 <<< Contents * Index >>> References Fairbanks K and Madsen R (1982). P values for tests using a repeated significance test design. Biometrika, 69, 69-74. Farrington CP and Manning G (1990). Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Statistics in Medicine, 9, 1447-1454. FDA (2002).Bioequivalence Guidance, Guidance for Industry No. 35, Oct. 9, 2002. FDA (CDER and CBER) (2010). Draft Guidance for Industry: Adaptive Design Clinical Trials for Drugs and Biologics, February 2010. Feng S, Liang Q, Kinser R, Newland K and Guilbaud R (2006). Testing equivalence between two laboratories or two methods using paired-sample analysis and interval hypothesis testing. Analytical and Bioanalytical Chemistry, 385(5), 975-981. Fienberg SE (1980). The Analysis of Cross-classified Categorical Data. 2nd Edition. M.I.T. Press, Cambridge, MA. Firth, D (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80, 27-38. Fisher, RA (1935). The Design of Experiments. Oliver and Boyd, Edinburgh. Fisher R (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7 (2), 179-188. Fitzmaurice GM, Laird NM, Ware JH (2004). Applied Longitudinal Analysis. John Wiley & Sons, New York. Flack VF, Afifi AA, Lachenbruch PA (1988). Sample Size Determinations for the two rater kappa statistic. Psychometrika, 53(3): 321-325. Fleiss JL. (1981). Statistical Methods for Rates and Proportions. John Wiley & Sons, New York. Fleiss JL, Cohen J (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement 33: 613-619. 2702 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Fleiss, JL, Levin, B and Pike, MC (2003). Statistical Methods for Rates and Proportions. John Wiley & Sons, New York. Fleiss JL, Tytun A, Ury SHK (1980). A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics 36: 343-346 Fleming, TR (1982). One-Sample Multiple Testing Procedure for Phase II Clinical Trials. Biometrics, 38, 143-151. Fleming, TR (2008). Current issues in non-inferiority trials. Stat Med, 27(3), 317-32. Freedman LS (1982). Tables of the number of patients required in clinical trials using the logrank test. Statistics in Medicine, 1: 121-129. Freirich et.al. (1963) The effect of 6-Mercaptopurine on the duration of Steroid induced remissions in acute leukemia: A model for evaluation of other potentially useful therapy, Blood, 1963 21 699:716 Friede Tim and Schmidli Heinz (2010). Blinded sample size re-estimation with count data: Methods and applications in multiple sclerosis. Statistics in Medicine, 29 1145-1156 Gallo P, Chuang-Stein C, Draglin V, Gaydos B, Krams M, Pinheiro J (2006). Adaptive designs in clinical drug development – an executive summary of the PhRMA Working Group. J. Biopharm Statist., 16, 275-83. Gao P, Ware J, Mehta C (2008). Sample size re-estimation for adaptive sequential design in clinical trials. J. Biopharm. Statist., 18(6), 1184-96. Gao P, Liu, L, Mehta C (2013). Exact Inference for adaptive group sequential designs. Statistics in Medicine, 32(23):3991-4005 Gao P, Liu, L, Mehta C (2014). Adaptive Sequential Testing for Multiple Comparisons. Journal of Biopharmaceutical Statistics, 24: 1035-1058 Gart JJ, Nam J (1988). Approximate interval estimation of the ratio of binomial parameters: A review and corrections for skewness. Biometrics 44: 323-338. Goodman SN, Zahurak ML, and Piantadosi S (1995). Some practical improvements in the continual reassessment method for phase I studies. Statistics in Medicine, 14:1149-1161. References 2703 <<< Contents * Index >>> References Graubard BI, Korn EL (1987). Choice of column scores for testing independence in ordered 2 × K contingency tables. Biometrics, 43: 471-476. Greenland S. (1991). On the logical justification of conditional tests for two-by-two contingency tables. The American Statistician. 45, 248:251. Greenwood, Jr. (1926). The Natural Duration of Cancer. Reports of Public Health and Related Subjects, Vol. 33, HMSO, London. Gu Kangxia, Ng Hon Keung Tony, Tang Man Lai, and Schucany William R. (2008). Testing the Ratio of Two Poisson Rates. Biometrical Journal, 50 (2008) 2, 283-298. Hajek P, Taylor TZ, and Mills P (2002). Brief intervention during hospital admission to help patients to give up smoking after myocardial infarction and bypass surgery: randomised controlled trial. BMJ, 324(7329), 87-89. Hauck WW, Preston PE and Bois FY (1997). A group sequential aproach to crossover trials for average bioequivalence. J. Biopharm. Statist. 7, 87-96. Hauschke D, Steinijans VW, Diletti E, Burke M (1992). Sample size determination for bioequivalence assessment using a multiplicative method. J. Pharmacokin. Biopharm., 20, 559-563. Hauschke D, Kieser M, Diletti E and Burke M (1998). Sample Size Determination for Proving Equivalence Based on the Ratio of Two Means for Normally Distributed Data. Statistics in Medicine, 18, 93-105. Haybittle JL (1971). Repeated assessment of results in clinical trials of cancer treatment. Brit.J.Radiology, 44, 793-797. Helen Brown, Robin Prescott (2006). Applied Mixed Models in Medicine. John Wiley & Sons, Ltd. Heinze G and Schemper M (2002). A solution to the problem of separation in regression. Statistics in Medicine, 21, 2409-2419. HochbergY (1988). A sharper Bonferroni procedure for multiple significance testing. Biometrika, 75, 800-802. Hochberg Y, Tamhane AC (1987). Multiple Comparison Procedures, Wiley, New York 2704 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Hocking RR (1985). The Analysis of Linear Models. Belmont, CA: Brooks/Cole Publishing Co. Hodges JL, Lehmann EL (1963). Estimates of location based on rank tests. The Annals of Mathematical Statistics, 34: 598-611. Hollander M, Wolfe DA (1999). Nonparametric Statistical Methods. second Ed. John Wiley and Sons, New York. Holm S (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65-70. Hommel G (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika, 75, 383-386. Hommel G (1989). A comparison of two modified Bonferroni procedures. Biometrika, 76, 624-625. Hosmer DW and Lemeshow S (2000). Applied Logistic Regression. Second Edition. Wiley, New York. Hsieh FY (1989). Sample size tables for logistic regression. Statistics in Medicine. 8: 795-802. Hutto C, Parks WP and Lai S (1991). A hospital based prospective study of perinatal infection with HIV-1. J. Pdiatr., 118, 347-53. Hwang IK, Shih WJ, and DeCani JS (1990). Group sequential designs using a family of type I error probability spending functions. Statistics in Medicine, 9, 1439-1445. Iezzi R, Cotroneo A R, Giammarino A, Spigonardo F and Storto M L (2011). Low-dose multidetector-row CT-angiography of abdominal aortic aneurysm after endovascular repair. European Journal of Radiology, 79, 21-28. Jennison C and Turnbull BW (1989). Interim analyses: the repeated confidence interval approach (with discussion). J.Roy.Statist.Soc.B, 51, 305-361. Jennison C and Turnbull BW (1997). Group sequential analysis incorporating covariate information. JASA, 92, 1330-41. References 2705 <<< Contents * Index >>> References Jennison C and Turnbull BW (2000). Group Sequential Methods with Applications to Clinical Trials. Chapman and Hall/CRC, London. Jennison, C and Turnbull, BW (2003). Mid-course sample size modification in clinical trial. Statistics in Medicine, 22, 971-993. Jennison, C and Turnbull, BW (2006). Adaptive and nonadaptive group sequential tests. Biometrika, 93(1), 1-21. Ji Y, Liu P, Li Y, and Bekele N (2010). A modified toxicity probability interval method for dose finding trials. Clinical trials, 7:653-656. Johnson and Wichern (1998) Applied Multivariate Statistical Analysis.4th Edition, Prentice Hall Jones B and Kenward MG (2003). Design and analysis of cross-over trials. Chapman and Hall/CRC, New York. Kalbfleisch JD and Prentice RL (2002). The Statistical Analysis of Failure Time Data. John Wiley & Sons, New Jersey. Kangxia Gu, Hon Keung Tony Ng, Man Lai Tang and William R. Schucany (2008). Testing the Ratio of Two Poisson Rates. Biometrical Journal 50 (2008) 2, 283-298 Kapur A, Malik IS, et al (2005). The coronary artery revascularisation in diabetes (CARDia) trial: Background, aims, and design. Am Heart J, 149, 13-19. Keene Oliver N., Jones Mark R. K., Lane Peter W., and Anderson Julie (2007). Analysis of exacerbation rates in asthma and chronic obstructive pulmonary disease: example from the TRISTAN study. Pharmaceutical statistics 6, 89-97 Kemeny N, Reichman B, Geller N and Hollander P (1988). Implementation of the group sequential methodology in a randomized trial in metastatic colorectal carcinoma. Am J Clin Oncol, 11, 66-72. Kendall MG, Stuart A (1979). The Advanced Theory of Statistics, 4th edition. Macmillan Publishing Co. Inc., New York. Keselman HJ, Algina J, Kowalchuk RK, Wolfinger RD (1998). A Comparison of Two 2706 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Approaches for Selecting Covariance Structures in the Analysis of Repeated Measures. Communications in Statistics-Computation and Simulation, 27(3), 591:604. Kim K (1989). Point estimation following group sequential tests. Biometrics, 45, 613-17. Kim K and DeMets DL (1987). Confidence intervals following group sequential test in clinical trials. Biometrics, 43, 857-64. Kim K and Tsiatis AA (1990). Study duration for clinical trials with survival response and early stopping rule. Biometrics, 46, 81-92. Kimura D and Zenger (1997). Standardizing sablefish (Anoplopoma fimbria) long-line survey abundance indices by modeling the log-ratio of paired comparative fishing cpues. ICES Journal of Marine Science, 54, 48-59. Kolassa J (1995). A comparison of size and power calculations for the Wilcoxon statistic for ordered categorical data. Statistics in Medicine, 14: 1577-1581. Kontula KT, Anderson LC, Paavonen T, Myllyla L, Teerenhovi L, and Vuopio P (1980). Glucocorticoid receptors and glucocorticoid sensitivity of human leukemic cells Int.J.Cancer, 26:177-183. Kontula KT, Paavonen T, Vuopio P, and Anderson LC (1982). Glucocorticoid receptors in hairy-cell leukemia. Int.J.Cancer, 30:423-426. Krall JM, Uthoff VA, and Harley JB (1975). A Step-up Procedure for Selecting Variables Associated with Survival. Biometrics, 31: 49-57. Kreyszig E (1970). Introductory Mathematical Statistics. John Wiley & Sons, Inc., New York. Kutner M, Nachtsheim C and Neter J (2004). Applied Linear Regression Models, 4th Edition, IRWIN, Chicago. Laarman GJ, Suttorp MJ, Dirksen MT, Loek van Heerebeek, Kiemeneij F, Slagboom T, Ron van der Wieken, Tijssen JGP, Rensing BJ, and Patterson M (2006). Paclitaxel-Eluting versus Uncoated Stents in Primary Percutaneous Coronary Intervention. NEJM, 355, 1105-1113. References 2707 <<< Contents * Index >>> References Lachin JM (1977). Sample size determinations for rxc comparative trials. Biometrics, 33: 315-324. Lachin, JM (1981). Introduction to sample size determination and power analysis for clinical trials. Controlled Clinical Trials, 2, 93-113. Lan, K., Hu, P., and Proschan, M. (2009). A conditional power approach to the evaluation of predictive power. Statistics in Biopharmaceutical Research, 1, 131-136. Lan KKG and DeMets DL (1983). Discrete sequential boundaries for clinical trials. Biometrika, 70, 659-663. Lan KKG and Wittes J (1988). The B-value: A tool for monitoring data. Biometrics, 44, 579-85. Lan KKG and Zucker D (1993). Sequential monitoring of clinical trials: the role of information and Brownian motion. Stats. in Med., 12, 753-65. Landis JR, Koch GG (1977). The Measurement of interrater agreement for categorical data. Biometrics, 33: 159-174 . Laster L and Johnson M (2003). Non-inferiority trials: the ‘at least as good as’ criterion. Stats. in Med., 22, 187-200. Lee ET (1992). Statistical Methods for Survival Data Analysis. John Wiley & Sons, New York. Lehmacher W and Wassmer G (1999). Adaptive sample size calculations in group sequential trials. Biometrics, 55, 1286-1290. Lehmann, EL (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco. Liebetrau, AM (1983). Measures of association, Sage Publications. Li G, Shih WJ, Xie T and Lu J (2002). A sample size adjustment procedure for clinical trials based on conditional power. Biostatistics, 3,2, 277-287. Li Lingling, Evans Scott, Uno Hajime, Wei L.J (2009) Predicted Interval Plots (PIPS): A Graphical Tool for Data Monitoring of Clinical Trials. Statistics in 2708 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Biopharmaceutical Research Vol 1, No.4 348-355 Little RJA (1989). Testing the equality of two independent binomial proportions. The American Statistician, 43, 283-288 Machin, D and Campbell, MJ (1987). Statistical Tables for Design of Clinical Trials, Blackwell Scientific Publications, Oxford. MacLaren, N.M (1989). The Generation of Multiple Independent Sequences of Pseudorandom Numbers. Applied Statistics, 38:351-359. Maindonald JH (1984). Statistical Computation. John Wiley & Sons, New York. Makuch RW, Parks WP (1988). Response of serum antigen level to AZT for the treatment of AIDS. AIDS Research and Human Retroviruses 4: 305-316. Mander AP and Sweeting MJ (2015). A Product of Independent beta Probabilities dose Escalation (PIPE) design for dual-agent Phase I trials. Statistics in Medicine, 34, 1261-1276. Mantel (1966) Evaluation of survival data and two new rank order statistics arising in its consideration, Cancer Chemother Rep., 50(3):163-70. Mantel N, Haenszel W (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719-748. Marcus R (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63, 655-660. Martinez-Martin P, Valldeoriola F, Molinuevo JL, Nobbe FA, Rumia J, and Tolosa E (2000). Pallidotomy and quality of life in patients with parkinson’s disease: An early study. Movement Disorders, 15(1), 65-70. Maurer W, Hothorn LA, Lehmacher W (1995). Multiple comparisons in drug clinical trials and preclinical assays: a priori ordered hypotheses. Biometrie in der Chemisch-in-Pharmazeutischen Industrie. 6, Vollman, J. (editor). Fischer-Verlag, Stuttgart, 3-18. Mehta, CR (2004). A Note on Standardizing δ̂ for Designs with Binomial Endpoints. In House Technical Report. References 2709 <<< Contents * Index >>> References Mehta CR, Liu L (2016). An objective re-evaluation of adaptive sample size re-estimation: commentary on ’Twenty-five years of confirmatory adaptive designs’. Statistics in Medicine, 35(3), 350-358 Mehta CR, Patel NR (1986). A hybrid algorithm for Fisher’s exact test on unordered r x c contingency tables. Communications in Statistics, 15:387-403. Mehta CR and Tsiatis AA (2001). Flexible sample size considerations using information-based interim monitoring. Drug Information Journal, 35, 1095-1112. Mehta CR, Bauer P, Posch M, Brannath W (2007). Repeated confidence intervals for adaptive group sequential trials. Statistics in Medicine, 26 (30), 5422-5433. Mehta CR and Pocock SJ (2011). Adaptive Increase in Sample Size when Interim Results are Promising: A Practical Guide with Examples. Statistics in Medicine30(28): 3267-84 Miettinen OS (1986). On the matched pairs design in the case of all- or- none responses. Biometrics, 24: 339-352. Miettinen, OS. and Nurminen, M (1985). Comparative Analysis of Two Rates. Statistics in Medicine, 4, 213-226. Miller AJ (1990). Subset selection in Regression. Chapman & Hall, London. Montgomery D, Peck E and Vining G (2001). Introduction to Linear Regression Analysis, 3rd Edition, Wiley, New York. Moseley JB, O’Malley K, Petersen NJ, et al. (2002). A controlled trial of arthroscopic surgery for osteoarthritis of the knee. New England Journal of Medicine, 347, 81-8. Muller, H-H and Schafer, H (2001). Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches. Biometrics, 57, 886-891. Naik UD (1975). Some selection rules for comparing p processes with a standard. Communications in Statistics. Series A. 4, 519-535. Nam J (1987). A simple approximation for calculating sample sizes for detecting linear 2710 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 trend in proportions. Biometrics, 43, 701-705. Neuenschwander B, Branson M, and Gsponer T (2008). Clinical aspects of the bayesian approach to phase I cancer trials. Statistics in Medicine, 27:2420-2439. Neuenschwander B, Matano A, Tang Z, Roychoudhury S, Wandel S and Bailey S (2015). A Bayesian Industry Approach to Phase I Combination Trials in Oncology. In: Statistical Methods in Drug Combination Studies, Zhao W, Yang H Ed. Chapman Hall/CRC Press, Boca Raton. Noether GE (1987), Sample size determination for some common nonparametric tests. J. American Statistical Assoc., 82, 645-647. nQuery Advisor (2005). Software for Sample Size and Power estimation. Statistical Solutions, Saugus. O Brien RG, Muller KE (1993). Applied Analysis of Variance in Behavioral Science. Marcel Dekker, New York. 8: 297-344 O’Brien PC, Fleming TR (1979). A multiple testing procedure for clinical trials. Biometrics, 35, 549-56. O’Hagan A, Stevens JW, Campbell M J (2005). Assurance in clinical trial design. Pharmaceutical Statistics, 4(3), 187-201. Oliver N. Keene, Mark R. K. Jones, Peter W. Lane, Julie Anderson (2007). Analysis of exacerbation rates in asthma and chronic obstructive pulmonary disease: example from the TRISTAN study. Pharmaceutical statistics, 6, 89-97 O’Quigley J, Pepe M, and Fisher L (1990). Continual reassessment method: A practical design for phase I clinical trials in cancer. Biometrics, 46:33-48. Overall JE, Doyle SR (1994). Estimating sample sizes for repeated measures designs. Controlled Clinical Trials, 15: 100-123. Owen, DB (1965). A Special Case of Bivariate Non-Central t-Distribution. Biometrika, 52, 3, 437-446. Pampallona S and Tsiatis AA (1994). Group sequential designs for one-sided and two-sided hypothesis testing with provision for early stopping in favor of the References 2711 <<< Contents * Index >>> References null hypothesis. J. Statist. Planning and Inference, 42, 19-35. Pampallona S, Tsiatis AA and Kim K (1995). Spending functions for type I and type II error probabilities of group sequential trials. Technical report, Dept. of Biostatistics, Harvard School of Public Health, Boston. Pampallona S, Tsiatis AA and Kim K (2001). Interim monitoring of group sequential trials using spending functions for the type I and type II error probabilities. Drug Information Journal, 35, 1113-1121. Parker M, Puddey IB, Beilin LJ and Vandongen R. (1990). A 2-way factorial study of alcohol and salt restriction in treated hypertensive men. Hypertension, 16, 398-406. Patel HI (1983). Use of baseline measurements in the two-period cross-over design. Communications in Statistics-Theory and Methods, 12, 2693-712. Patterson S , Jones B (2006). Bioequivalence and Statistics in Clinical Pharmacology. Chapman & Hall/CRC, Taylor & Francis Group. Pearson et.al.(2003) Treatment effects of methylphenidate on cognitive functioning in children with mental retardation and ADHD. Journal of the American Academy of Child and Adolescent Psychiatry, 43, 677-685. Phillips KE (1990). Power of the two one-sided tests procedure in bioequivalence. Journal of Pharmacokinetics and Biopharmaceutics 18. Pitt B, Zannad F, Remme WJ, Cody R, Castaigne A, Perez A, Palensky J, Wittes J (1999). The effects of spironolactone on morbidity and mortality in patients with severe heart failure. New England Journal of Medicine, 341, 10, 709-717. Pocock SJ (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika, 64, 191-99. Posch M and Bauer P (1999). adaptive two stage designs and the conditional error function. Biometrical Journal, 41, 689-696. Posch Martin, Koenig Franz, Branson Michael, Brannath Warner, Dunger-Baldauf Cornelia and Bauer Peter (2005). Testing and estimation in flexible group sequential designs with adaptive treatment selection. Statistics in Medicine; 2712 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 24: 3697-3714. Pregibon D (1981). Logistic Regression Diagnostics. Ann. Statist., 9: 705-724. Pritchett Y, Jemiai Y, Chang Y, et al. (2011). The use of group sequential, information-based sample size re-estimation in the design of the PRIMO study of chronic kidney disease. Clinical Trials, 8(2), 165-174. Proschan, MA and Hunsberger, SA (1995). Designed extension of studies based on conditional power. Biometrics, 51, 1315-1324. Purich E (1980). Bioavailability/Bioequivalency Regulation: An FDA Perspective in Drug Absorption and Disposition (Ed. K.S. Albert), American Statistical Association and Academy of Pharmaceutical Sciences, Washington, D.C., 115-137. Rabbee N, Coull BA, Mehta CR, Patel NR, Senchaudhuri P (2003). Power and sample size for ordered categorical data. Statistical Methods in Medical Research, 12, 73-84. Reboussin DM, DeMets DL, Kim K, and Lan KKG (2002). Programs for computing group sequential boundaries using the Lan-DeMets method. SDAC, Dept.of Biostat. and Med.Informatics, University of Wisconsin Medical School. Rencher AC (1995). Methods of Multivariate Analysis. John Wiley & Sons, New York Ribas A, Hauschild A, Kefford R, Punt C, Haanen J, Marmol M, Garbe C, Gomez-Navarro J, Pavlov D and Marshall M (2008). Phase III, open-label, randomized, comparative study of tremelimumab (CP-675,206) and chemotherapy (temozolomide [TMZ] or dacarbazine [DTIC]) in patients with advanced melanoma. Journal of Clinical Oncology, 26(suppl), 485s, abstr LBA9011. Robins J, Breslow N, Greenland S (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics 42:311-323. Rothman M, Li N, Chen G, Chi G, Temple R, Tsou HH (2003). Design and analysis of non-inferiority mortality trials in oncology. Stats. in Med., 22, 239-264. Ryan TP (1997). Modern Regression Methods. John Wiley & Sons, New York. References 2713 <<< Contents * Index >>> References Sabin T, Matcham J, Bray S, Copas A, and Parmar MKB. (2014). A quantitative process for enhancing end of Phase 2 decisions. Statistics in Biopharmaceutical Research 6:67-77. Santner TJ and Duffy DE (1989). The Statistical Analysis of Discrete Data. Springer-Verlag, New York. Santner TJ, Snell MK (1980). Small-sample confidence intervals for p1 − p2 and p1 /p2 in 2 × 2 contingency tables. Journal of the American Statistical Association 75:386-394. Santner TJ, Yamagami S. (1993). Invariant small sample confidence intervals for the difference of two success probabilities. Communications in Statistics, Part B – Simulation and Computation 22:33-59. Sarkar, S. (1998). Some probability inequalities for ordered MTP2 random variables: a proof of Simes conjecture. Annals of Statistics 26, 494-504. Sarkar, S., and Chang, C. K. (1997). Simes’ method for multiple hypothesis testing with positively dependent test statistics.Journal of the American Statistical Association, 92, 1601-1608. Scharfstein DO and Tsiatis AA (1998). The use of simulation and bootstrap in information-based group sequential studies. Stats. in Med., 17, 75-87. Scharfstein DO, Tsiatis AA, and Robins JM (1997). Semiparametric efficiency and its implication on the design and analysis of group-sequential studies. JASA, 92, 1342-50. Schoenfeld DA (1981). The asymptotic properties of comparative tests for comparing survival distributions. Biometrika, 68, 316-9. Schoenfeld DA (1983). Sample-size formula for the proportional-hazards regression model. Biometrics, 39, 499-503. Schoenfeld DA and Richter, JR (1982). Nomograms for calculating the number of patients needed for a clinical trial with survival as an endpoint. Biometrics, 38, 163-70. Schuirmann DJ (1987). A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. 2714 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Pharmacokin. Biopharm., 15, 657-680. Schultz JR, Nichol FR, Elfring GL, and Weed SD (1973). Multiple stage procedure for drug screening. Biometrika, 29, 293-300. Scott Patterson, Byron Jones (2006). Interdisciplinary statistics Bioequivalence and Statistics in Clinical Pharmacology. Chapman and Hall/CRC Self SG, Mauritsen RH.(1988). Power and Sample size calculations for generalized linear models, Biometrics, 44, 79-86. Self SG, Mauritsen RH, Ohara J (1992). Power calculation for likelihood ratio tets in generalized linear models. Biometrics, 48, 31-39. Shein-Chung Chow, Jen-pei Liu (1998). Design and Analysis of Clinical Trials Concepts and Methodologies. Wiley series in Probability and Statistics. Shen Y and Fisher L (1999). Statistical inference for self-designing clinical trials with one-sided hypothesis. Biometrics, 55, 190-197. Sheskin DJ (2004). Handbook of Parametric and Nonparametric Statistical Procedures (3rd ed.). Chapman Hall/CRC Press, Boca Raton, FL. Sidak Z (1967). Rectangular confidence regions for the means of multivariate normal distributions. Journal of the American Statistical Association, 62, 626-633. Sidik K (2003). Exact unconditional tests for testing non-inferiority in matched-pairs design. Statistics in Medicine 22:265-278. Siegel S, Castellan NJ (1988). Nonparametric statistics for the behavioral sciences. 2nd edition. McGraw-Hill, New York. Simon R (1989). Optimal Two-Stage Designs for Phase II Clinical Trials. Controlled Clinical Trials 10:1-10. Simon R, Rubinstein L, Arbuck SG, Christian MC, Freidlin B and Collins J (1997). Accelerated Titration Designs for Phase I Clinical Trials in Oncology. Journal of the National Cancer Institute, 89, 1138-1147. Snapinn SM, Small RD (1986). Tests of significance using regression models for ordered categorical data. Biometrics, 42:583-592. References 2715 <<< Contents * Index >>> References Snedecor GW, Cochran WG (1989), Statistical Methods. 8th Edition, Iowa State University Press, Ames, IA. SPAF III Writing Committee for the Stroke Prevention in Atrial Fibrillation Investigators (1998). Patients With Nonvalvular Atrial Fibrillation at Low Risk of Stroke During Treatment With Aspirin: Stroke Prevention in Atrial Fibrillation III Study.JAMA, 1998;279:1273-1277. Spaulding C, Henry P, Teiger E, Beatt K, Bramucci E, Carrie D, Slama MS, Merkely B, Erglis A, Margheri M, Varenne O, Cebrian A, Stoll HP, DB Snead DB, Bode C (2006). Sirolimus-Eluting versus Uncoated Stents in Acute Myocardial Infarction. NEJM, 355, 1083-1104. Sprent P (1993). Applied Nonparametric Statistical Methods. 2nd edition. Chapman and Hall, New York. Steinijans VW, Hauck WW, Diletti E, Hauschke D, and Anderson S (1992). Effect of changing the bioequivalence range from (0.80, 1.20) to (0.80, 1.25) on the power and sample size. Int J Clin Pharmacol Ther Toxicol, 30, 571-575. Storer BE (1989). Design and analysis of phase I clinical trials. Biometrics, 45:925-937. Stroke Prevention in Atrial Fibrillation Investigators (1996). Adjusted-dose warfarin versus low-intensity, fixed-dose warfarin plus aspirin for high-risk patients with atrial fibrillation: Stroke Prevention in Atrial Fibrillation III randomised clinical trial. Lancet, 348, 633-38. Stout W, Marden J, Travers K. (1999). Statistics: Making Sense of Data. Mobius Communications. Suissa S, Shuster J (1985). Exact unconditional sample sizes for the 2 × 2 binomial trial. Journal of Royal Statistical Society Series A 148:317-327. Suissa S, Shuster J (1991). The 2 × 2 matched-pairs trial: Exact unconditional design and analysis. Biometrics 47:361-372. Sweeting M, Mander A, Sabin T. (2013). bcrm : Bayesian Continual Reassessment Method Designs for Phase I Dose-Finding Trials. Journal of Statistical Software 54(13): 1-26. 2716 References <<< Contents * Index >>> R East 6.4 c Cytel Inc. Copyright 2016 Tarone , RE. (1985). On heterogeneity tests based on efficient scores. Biometrika 72(1): 91-95. Thomas RG, Conlon M (1992). Sample size determination based on Fisher’s exact test for use in 2x2 comparative trials with low event rates. Controlled Clinical Trials. 13: 134-147. Tim Friede and Heinz Schmidli (2010). Blinded sample size reestimation with count data: Methods and applications in multiple sclerosis. Statistics in Medicine. 2010, 29 1145–1156 Tsiatis AA (1981). The asymptotic joint distribution of the efficient scores test for the proportional hazards model calculated over time. Biometrika, 68, 311-15. Tsiatis AA (1982). Group sequential methods for survival analysis with staggered entry. In: Survival analysis (eds. Crowley J and Johnson RA), Hayward, California: Institute of Mathematical Statistics, 257-68. Tsiatis AA and Mehta CR (2003). On the inefficiency of the adaptive design for monitoring clinical trials. Biometrika, 90, 367-378. Tsiatis AA, Rosner GL and Mehta CR (1984). Exact confidence intervals following a group sequential test. Biometrics, 40, 797-03. Tukey JW (1977). Exploratory data analysis. Addison-Wesley Publishing, Reading, MA. Upton GJG (1992). Fisher’s exact test. J. R. Statist. Soc. Ser. A, 155, 395-402. Van de Werf F (2006). Drug-Eluting Stents in Acute Myocardial Infarction. NEJM, 355, 1169-1170. Venzon DJ and Moolgavkar SH (1988). A method for computing Profile-Likelihood-Based Confidence Intervals. Applied Statistics, 37; 1, 87-94. Volberding PA, Lagakos SW, et. al. (1990). Zidovudine in asymptomatic human immunodeficiency virus infection. New England Journal of Medicine, 322:14, 941-949. Wald A, Wolfowitz J (1940). On a test whether two samples are from the same References 2717 <<< Contents * Index >>> References population. Ann Math Stat 11:147-162. Walter SD (1976). The estimation and interpretation of attributable risk in health research Biometrics 32:829-849. Wang SJ, Hung HMJ, Tsong Y, Cui L (2001). Group sequential strategies for superiority and non-inferiority hypotheses in active controlled clinical trials. Statistics in Medicine, 20, 1903-1912. Wang SK and Tsiatis AA (1987). Approximately optimal one-parameter boundaries for group sequential trials. Biometrics, 43, 193-99. Weight control and risk factor reduction in obese subjects treated for 2 years with Orlisat (1999). JAMA, 281, 235-42. Werner M, Tolls R, Hultin J, Mellecker J (1985). Sex and age dependence of serum calcium, inorganic phosphotrus, total protein, and albumin in a large ambulatory population. Fifth International Congress on Automation, Advances in Automated Analysis, 3, 59-65; Werner, M., Tolls, R. E., Hultin, J. V., and Mellecker, J(1970) Influence of sex and age on the normal range of eleven serum constituents. Z. Klin. Chem. Klin. Biochem. 8, 105-115 (1970). Westfall PH and Krishen A (2001). Optimally weighted, fixed sequence, and gatekeeping multiple testing procedures. Journal of Statistical Planning and Inference, 99, 25-40. Wiens B (2003). A fixed-sequence Bonferroni procedure for testing multiple endpoints. Pharmaceutical Statistics, 2, 211-215. Wiens B, Dmitrienko A (2005). The fallback procedure for evaluating a single family of hypotheses. Journal of Biopharmaceutical Statistics, 15, 929-942. Wilcoxon, F (1945). Individual Comparisons by Ranking Methods. Biometrics, 1, 80-83. 2718 References <<< Contents * Index >>> A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Index A Accrual Model, 1614, 1627, 1639, 1652 accrual time, 2310 accrual versus study duration chart, 2308 acute coronary syndromes, 1041 adaptive design, 1035 adaptive group sequential design, 1044 adaptive re-design, 1226 adaptive simulation settings, 1063 adding a futility boundary, 1046 adjusted inference, 2328 adjusted confidence interval, 1415, 2328, 2330 adjusted p-value, 427, 2328–2329 adjusted point estimate, 2330 inner wedge, 2331 no adjusted inference, 2332 ordering the sample space, 2328 admissible design, 774 Agreement, 649 allocation ratio equal, 117 unequal, 203 alpha spending function, 1471, 2315 Alzheimer’s disease clinical trial, 242 Analysis-Descriptive Statistics, 1827 analysis, 1810 Analysis of Variance two-way, 1841 analysis Case Data Editor, 1810 crossover, 1919, 1923, 1934, 1939, 1950, 1954 equivalence, 1946, 1950, 1954 non-inferiority, 1929, 1934, 1939 2719 noninferiority, 1901 one-sided test, 1890 paired data, 1901 ratio of means, 1903, 1910, 1915, 1929, 1946 superiority, 1890, 1919, 1923 Wilcoxon Signed Rank Test, 1898 binary endpoint, 2060 ANOVA, 235, 1982 one-way, 2547 one way, 232, 1976 repeated constant correlation, 235, 1982 two-way, 2549 two way, 237, 1985 two-way, 1841 arbitrary error probability, 156 Area, 1861 ASN-average sample number, 2305 assurance, 174, 981 asymmetric two-sided boundaries, 1481, 1484 attained significance level, 726 auto-hide, 11 availability of adaptive features, 1022 B Backward Image Confidence Interval, 1221 example, 1272 balanced randomization, 395 Bar Plot, 1855 Barnard’s unconditional test, 2570 Bayesian, 174, 981, 2151 Bayesian predictive power, 177 benefits of adaptive designs, 1028 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index benefits of simulating adaptive design, 1167 Berger-Boos correction, 2589 beta spending function, 1471, 1484, 2294, 2315 binomial design, 2298 pooled variance, 395, 427 pooled vs unpooled, 394, 2298 unpooled variance, 395, 427 binomial distribution, 394 binomial endpoint, 474–475, 751–752, 1359 Binomial Endpoint Analysis, 2069 Non-Inferiority Trial, 2088 Superiority Trial, 2069 Binomial Endpoint Create Multiple Designs, 356 Fixed Sample Design, 350 Group Sequential Design, 352 Interim Monitoring, 359 Simulations, 357 binomial endpoint unknown baseline, 1401, binomial study design, 1401 information based, 1401 Binomial Superiority Regression, 644 binomial equivalence exact test, 767 exact, 714 chi square statistic, 395 Simon’s two-stage design, 774 superiority exact, 714, one-sample, 714 two-sample equivalence test, 767 two-sample exact tests, 736 two-sample tests, 736 bivariate log-normal, 121, 136 bivariate normal, 113, 121, 136 bivariate t, 129, 212 Blyth-Still-Casella intervals, 2565 2720 Index Bonferroni procedure, 252, 577, 2031, 2181 survival, 2243 boundaries, 2286, 2293, 2295 asymmetric, 1481, 1484 early rejection of H0, 2293 early rejection of H0 or H1, 2294–2295 early rejection of H1 only, 2297 Generalized Haybittle-Peto, 2286 Haybittle-Peto, 2286 Pampallona-Tsiatis, 2288 spending function, 2293, 2295 Wang-Tsiatis, 2287 boundary chart, 147 p-value scale, 148 boundary crossing probability, 2352 boundary scale, 148 boundary scales conditional power scale, 484 boundary shape parameter, 2348 O’Brien-Fleming, 145 Pocock, 149 Box Plots, 1863 Bubble Plots, 1864 BWCI, 1221 example, 1272 by-passing test statistic calculator, 2332 C calendar time, 1402, 2280 CAPTURE clinical trial, 23, 395, 1623 case data editor, 55 CDL method, 1021, 1160 acute coronary syndromes example, 1171 alpha preservation, 1177 binomial endpoint example, 1171 comparison of group and adaptive sequential designs, 1176 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 group sequential design example, 1172 normal endpoint example, 1162 preservation of type-1 error, 1177 Schizophrenia example, 1162 simulation parameters, 1164 chart of accrual versus study duration, 2308 chart of conditional power, 2322 chart of post-hoc power, 2321 chart of stopping boundaries, 147 Chen DeMets and Lan method, 1021, 1160 chi-square statistic, 394–395, 2300 CHIRAND, 70 choice of variance, 2310 CHW method, 1021, 1055 acute coronary syndromes example, 1093 IM, 1089, 1152 preservation of type-1 error, 1058 repeated confidence intervals, 1058 adaptive design example, 1077 adding a futility boundary, 1105 binomial endpoint example, 1093 calculation of repeated p-values, 1060 comparison of adaptive design to fixed sample design, 1087 conditional power, 1061 fixed sample study example, 1076 interim monitoring sheet, 1089, 1152 normal endpoint example, 1074 operating characteristics of adaptive group sequential design, 1100 repeated p-values, 1058 schizophrenia example, 1074 simulation results by zone, 1085 2721 statistical theory, 1056 CHW simulation Input, 1081, 1099 CHW Simulation assumptions in East, 1062 CHW Statistic, 1058 class of spending functions, 2347 Classification table, 2170–2171, 2179, 2638 coefficient of variation, 110, 137, 199–200, 217, 228, 1915, 1929 Cohen’s Kappa, 649 Collinearity diagnostics, 2552 column functions, 69 combined efficacy and futility, 1444 comparing survival curves, 2219 comparison of adaptive and non-adaptive group sequential designs, 1103 comparison of CHW CDL and conventional Wald, 1213 Comparison of Designs, 25 comparison of multiple comparison procedures Analysis, 2207 comparison of fixed sample and adaptive design, 1039 group sequential and fixed sample design, 1034 Completers Prediction Plot, 1622, 1631 computation of boundaries, 2352 computing boundaries for the exact group sequential test, 2608 computing conditional power for a pre specified sample size, 1361 computing conditional power for specified number of events, 1368 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index computing conditional power for specified sample size, 1354 computing number of events (overall), 1386 computing number of events for specified conditional power, 1370 computing sample size for desired conditional power, 1357 computing the required sample size increase, 1038 Conditional exact test, 2139 conditional power, 196, 424, 484, 2321–2322, 2324 conditional power at ideal next look position, 2321 conditional power calculator, 1361, conditional power calculator, 1107, 1350 in simulation, 1371 interim monitoring, 1351, 1366 conditional power chart, 2322 conditional power for decision making, 1350, conditional power, 1061 boundary scale, 484 ideal next look position, 2321 informal use of, 1441 one sided tests, 2322 stopping for futility, 2321 target, 1082 two sided, 1061 two sided tests, 2322 conditional rejection probabilities, 1222 confidence interval, 422, 424, 427, 1415 Confidence interval Clopper-Pearson method, 2565 equivalence test, 2576 confidence interval adjusted, 422, 424, 427, 1415 2722 Index Confidence Intervals, 2185, 2188, 2191, 2194, 2197, 2199, 2203, 2206 Plot, 2185, 2188, 2191, 2194, 2197, 2199, 2203, 2206 conjunctive power, 248, 251, 253–255, 259–262, 586, 590, 592, 594–595, 597, 599, 1011, 1014, 1017 survival, 1005, 1007, 1009 conservative futility boundaries, 1453 conservative spending function, 1471, Correlations, 1843 Kendall’s Tau, 1843 Pearson’s Correlation, 1843 Spearman’s Rho, 1843 count data, 790 CP calculator, 1350 Cramer’s V, 2598 create Variable, 56 Cross Tabulation, 1832 crossover data editor, 61 crossover design, 206, 208, 221, 227, 1918, 1933, 1950, 1953, 1965 example, 207–208, 223, 228 ratio of means, 208, 227, 1953 simulation, 230 single look, 223, 229 Crossover Plots, 1879 crossover analysis, 1919, 1923, 1934, 1939, 1950, 1954 equivalence, 1950, 1954 example, 1919, 1923, 1934, 1939, 1950, 1954 non-inferiority, 1934, 1939 superiority, 1919, 1923 Cui Hung and Wang method, 1021, 1055 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 cumulative information, 1413,–1414 Cumulative Plot, 1870, 1872 left, 1870 Right, 1872 CV, 200 D Data Exploration Plots, 1854 Data Set SMALLT, 2111 SWINE, 1841 Root, 1852 WERNER, 1848 Data Sets VACCINE, 2097 Data sets BD, 2081 Gupta, 2127 OESOPHAGEAL, 2081 VARI, 1870 data transformation, 63 data Cancer, 2220, 2222, 2224, 2231, 2233 dataset CrossoverCaseData, 1919 Iris, 1941 Methylphenidate, 1890 Olestra, 1901 pkfood, 1934, 1950 decision making, 1350 deleting a design, 117 deleting observations, 1943 departure from design, 2318 Descriptive Statistics central tendency, 1827 coefficient of variation, 1827 count, 1827 dispersion, 1827 geometric mean, 1827 harmonic mean, 1827 2723 kurtosis, 1827 maximum, 1827 mean, 1827 median, 1827 minimum, 1827 mode, 1827 skewness, 1827 standard deviation, 1827 standard error of mean, 1827 sum, 1827 variance, 1827 design menu, 23 design many means, 232, 1976 sample size rounding, 16, 76, 347, 711, 787, 823, 1024, 1390, 1803 information based, 1394 Poisson, 1416, 1423 designing a study given limited data, 1029 designing the primary trial, 1226 difference of means for crossover data, 171 difference of proportions exact tests, 736 disjunctive power, 248, 251, 253–255, 259–262, 586, 589, 591, 594–595, 597, 599, 1011, 1014, 1017 survival, 1005, 1007, 1009 Dose-Finding Hypertension Trial, 2025 dose response curve, 582 linear, 244 logistic, 582 drawback of single group sequential design, 1043 drift parameter, 141, 2275–2278, 2282–2283 Dropout Prediction Plot, 1622, 1632 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index Dropout Predictions Plot, 1726, 1736, 1754, 1762, 1768 dropout rates, 872 specifying the parameters, 994 dropouts, 872, 994 Dropouts Prediction Plot, 1644, 1657 Dropouts Predictions Plot, 1747, 1779, 1786 Dunnett procedure, 241–242 Duration-Accrual chart, 871 E early rejection H1, 2297 early rejection of H0, 2293 early rejection of H0 or H1, 2294–2295 early stopping, 157, 394, 412 early stopping for efficacy, 1439 early stopping for futility, 1484, 2294 early stopping for benefit or futility, 412 for futility, 1456 east workbook, 28 edit data, 58 edit simulation, 135, 226 editing simulation, 120 effect size, 1393, 1425, 1929 equivalence, 1945 efficacy boundaries, 1444, 1471 efficacy trial, 23, 395 efficiency considerations for CDL, 1213 efficient estimators, 2272 Egret Siz, 1424 Eliminating nuisance parameters, 2570 Enrollment Plan, 1615, 1627, 1639, 1652 Enrollment Prediction Plot, 1620, 1631, 1643, 1656, 1686, 1693, 1700, 1714 enrollment range, 2308 Enrollment/Events Prediction, 1609, 1675 At Design Stage, 1609 At Interim Monitoring Stage, 1658 2724 Index Enrollments Prediction Plot, 1746, 1753, 1778, 1785 equivalence, 128, 211, 1907 Equivalence example binomial difference, 2106, 2108 equivalence limits, 128, 211, 221 equivalence testing of two binomials power of, 2635 equivalence testing of two independent binomials power of, 2617 equivalence analysis, 1946, 1950, 1954 crossover, 1950, 1954 crossover design, 221, 227, 1950, 1953 example, 130, 137, 212, 217, 223, 228, 1908, 1941, 1946, 1950, 1954 normal, 211 paired data, 128, 136, 1907, 1910 power calculation, 129, 212 ratio of means, 136, 217, 227, 1910, 1953 simulation, 134, 139, 216, 220, 230 single look, 213, 218, 223, 229 test of hypothesis, 128, 211, 221 error spending chart, 425 error spending function interim monitoring, 2313 error spending functions, 413 evaluating the BWCI and RCI Methods by Simulation, 1272 Events Prediction Plot, 1643, 1656, 1725, 1735, 1754, 1761, 1767 Exact Conditional Test, 2613 Exact confidence interval difference of binomials, 2576 ratio of binomials, 2581, 2586 exact power, 2617–2618 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 Exact test, 2139 exact test integer sample size, 709 exact unconditional power, 2617 exact time limit option, 16, 76, 347, 711, 787, 823, 1024, 1390, 1803 example odds Ratio of proportions, 2103, Example, 2069 example bioequivalence, 217 crossover, 1919, 1923, 1934, 1939, 1950, 1954 crossover design, 228 equivalence, 130, 137, 212, 217, 228, 1908, 1941, 1946, 1950, 1954 kappa, 2218 non-inferiority, 114, 122, 200, 1929, 1934, 1939 noninferiority, 1901 Odds ratio of proportions, 2078 one-sided test, 91, 1891 paired, 102, 114, 130, 1892, 1898, 1908 paired data, 122, 137 ratio of means, 122, 137, 200, 217, 228, 1915, 1923, 1929, 1939, 1946, 1954 repeated measure regression, 2006 superiority, 91, 102, 1890, 1892, 1915, 1919, 1923 Wilcoxon Signed Rank Test, 1898 acute coronary syndromes, 1041 Example Alcohol and oesophageal cancer, 2081 Animal Toxicology Example 1, 2122 2725 binomial test, 2060 Clinical Trial Data, 2069 McNemar test, 2065 example negative symptoms Schizophrenia, 1030 Odds ratio of proportions, 2103 Example Oral lesions data, 2127 pilot study for a new drug, 2060 example two binomials, 1394 Example Voters Preference, 2065 exit probabilities, 1426–1427 exit probability, 2304 expected events, 2305 expected information, 1394, 2304–2305 expected number of events, 2305 Expected response, 2179 expected sample size, 2304–2305 expected stopping time, 2304 binomial, 2304 normal, 2304 survival, 2305 exponential failure, 2308 expression builder, 64 Extended CDL method, 1160 acute coronary syndromes example, 1204 cut-off points for some typical two stage designs, 1197 necessity of CDL criteria, 1208 preservation of type-1 error, 1201, 1207 Schizophrenia example, 1198 extension of CDL method, 1191 F failure-time trials, 820 failure rate, 2308 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index Fallback, 2204 Fallback Procedure, 2203 fallback procedure, 260, 596, 1015, 2046, 2258 Fallback Proportion of Alpha, 2204 Test Sequence, 2204 favorable zone, 1037 FDA data, 2122 animal toxicology, 2122 FDA Guidance on Adaptive Designs, 1020 filter designs, 27 filter variable, 60 Final look, 1414 first interim analysis, 1428 first interim look, 421 Fisher’s exact test, 2069, 2618 2 X 2 table, 2069 power of, 2617 Fisher exact test, 2597 Fisher information, 2271–2272 fixed accrual, 821 fixed follow-up designs, 2310 fixed sample design, 142, 1031 Fixed Sequence Procedure, 2200 fixed sequence procedure, 593 fixed sequence testing, 258, 2044 survival, 1012, 2256 fixed study duration, 821 flexibility of adaptive approach, 1053 flexible clinical trial, 1027 flexible interim monitoring, 2314 flexible monitoring, 2314 flexible stopping boundaries, 1460 follow-up, 2308, 2310 fixed, 2310 variable, 2308 follow up time, 2310 four parameter logistic curve, 581 Frequency Distribution, 1829 2726 Index futility boundaries, 481, 1471, 2294, 2296–2297 overruling, 2296 futility boundary, 774, 1484 non-binding, 1486 overruling, 1486 futility stopping, 1441 futility stopping boundaries, 1444 conservative, 1453 FWER, 240, 577, 999, 2024, 2180 survival, 2240 G G vs I designs, 2303 gamma spending function, 2291 general design module, 2302 general design Poisson data, 1426–1427 general designs, 1423 general distribution, 2332 interim monitoring, 2332 Generalized Haybittle-Peto boundaries, 2286 geometric mean, 137 getting started with East 6, 7 Analysis Menu, 21 Data Editor menu, 19 Design menu, 23 Home menu, 12 Interim Monitoring, 49 user interface, 12 workflow, 8 global power, 248, 253, 586–587, 590, 592, 1005, 1011 Gm spending function, 2291 Goodman-Kruskal Gamma, 1844 group sequential, 94, 103, 188 group sequential design, 145, 476, 1031, 1042, 1439, 2271 group sequential paired data, 103 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 superiority, 103 guaranteed alpha error, 2315 Gupta data set, 2127 H H0-H1 boundaries, 2294–2295 two-sided, 2296 H0-only boundaries, 2293 H1-only boundaries, 2297 Hat matrix, 2554 Haybittle-Peto boundaries, 1462, 1466, 2286 help panel, 11 hiding help panel, 11 Histogram, 1874 Hochberg’s Step Up, 2194 Analysis, 2194 Hocheberg procedure, 256, 592, 1010, 2039, 2194 survival, 2251 Holm’s Step Down, 2191 Analysis, 2191 Holm step-down procedure, 254, 591, 2038, 2191, 2250 survival, 1007 Hommel’s Step Up, 2197 Analysis, 2197 Hommel procedure, 256, 592, 1010, 2039, 2194, 2251 Homogeneity test examples, 2081 Horizontal Bar Plot, 1857 Horizontal Stacked Bar Plot, 1859 hypergeometric variance, 2280 hypothesis H1/2, 2304 Hypothesis test multiparameter, 2139 hypothesis testing, 1222 I I vs G designs, 2303 2727 IF() function, 67 impact of futility boundary, 1112 implementing the adaptive changes through a secondary trial, 1238 incremental Wald statistics, 1056 independent increments, 2352 inflation factor, 1401, 1423, 2301 Influence statistics, 1847 Influential groups, 2179 informal use of conditional power, 1441 information based design, 1027, 1394–1395, 1401, 1409, 1428, 2302 information based inference, 1394 information based monitoring, 1401, 1428, 2333 sample size re-estimation, 2333 information based Poisson, 1416 stroke study, 1418 information fraction, 1402, 1404–1406, 1413, 1431, 2274, 2320 information measures, 2304 information vs sample size, 2303 inner wedge, 2296 inner wedge boundaries, 2289 inner wedge stopping boundaries, 2331 input multiple values, 116 interim monitoring, 491–492, 504–505, 519–520, 531–532, 727, 947, 1401, 1412, 1419, 1428, 2313 first look, 99, 107, 196 non-inferiority, 195 paired data, 106 second look, 108 superiority, 106 error spending function, 2313 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index flexible, 2314 information based, 1394, 2333 Lan-DeMets, 2314 preserving alpha, 2314 theory, 2313 introduction to survival endpoint, 820 invoking CDL simulation, 1163 invoking CHW simulation, 1081, 1099 J Jennison and Turnbull, 1220, 2271 Jennison Turnbull theorem, 2284, 2303 K Kalbfleisch and Prentice, 900 Kendall’s Coefficient of Concordance, 1844 Kendall’s Tau, 2551, 1843 key advantage of adaptive plan, 1040 L Lan-DeMets, 97, 103, 190 Lan-DeMets spending function, 400–401, 1418, 1471, 2291, 2314 O’Brien-Fleming flavor, 400, 2291 Pocock flavor, 401, 2291 Lan-DeMets extension to preserving beta, 2315 interim monitoring, 2314 last look, 2317 optimal placement, 2319–2320 recomputation of boundary, 2318 LD(OF) spending function, 2291 LD(PK) spending function, 2291 Left Cumulative Plot, 1870 Lehmacher and Wassmer, 1055 Likelihood ratio test, 2139 limitation of group sequential design, 1097 limitations of CDL method, 1160 2728 Index linear mixed model ratio of means crossover, 2023 log rank test, 2220, 2222, 2224, 2231, 2233 log transformation, 200, 217, 228, 1915, 1923, 1929, 1938, 1953 log transformed data, 200, 217, 228, 1953 logistic dose response curve, 581 Logistic Regression, 644 logistical issues, 1054 lognormal data, 110, 121, 136, 208, 217, 227, 1903, 1910, 1915, 1929, 1953 logrank score statistic, 2280 logrank statistic, 2280 long-term mortality, 820 loss of efficiency using CHW method, 1160 lower stopping boundary, 2288, 2294 M making adaptive changes to the primary trial, 1234 MAMS Designs, 285 Continuous Endpoint, 285 Many Proportions, 549, 2111 marginal power, 248, 251, 586 maximum events, 2282–2283, 2305 maximum failures, 2305 maximum information, 1394, 1412, 2278, 2282–2283, 2289, 2305 Maximum likelihood, 2138 Maximum likelihood estimates, 2580 Maximum likelihood non-convergence, 2138 maximum sample size, 1412, 2278, 2289, 2305 maximum study duration, 2306 maximum usable sample size, 1082 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 McNemar’s, 729 McNemar’s conditional exact test, 729 McNemar’s test, 2611 power, 2611 McNemar’s Test power, 2612 McNemar’s McNemar’s conditional test, 714 McNemar probabilities, 731 McNemar test, 2566 Measures of agreement, 2216 Cohen’s Kappa, 2216 weighted Kappa, 2216 median unbiased estimate, 2330 menus in East 6, 12 midcourse change in desired power, 1029 minimax design, 774 minimum usable sample size, 1082 missing value code, 70 missing values, 57 MLE non-convergence, 2138 Model Terms item, 2133, 2143 monitoring for general distributions, 2332 monitoring the primary trial, 1228 Monte Carlo accuracy, 164 motivation for adaptive sample size changes, 1027 mulitple comparison Bonferroni procedure (weighted), 577, 588 Dunnett procedure (step-down), 242, 250, 2028 Dunnett procedure, 241 fallback procedure, 596 p-value based procedures, 251, 2030 Sidak procedure, 590 Muller and Schafer method, 1022, 1221 Muller and Schafer Method: Interim Monitoring, 1327 2729 multi-arm trials, 2272 Multicollinearity Criterion, 2553 Multinomial distribution, 2111 Multinomial probabilities, 2111 multiple comparison procedures survival, 1011 multiple comparison survival Hocheberg procedure survival, 2251 multiple comparison Bonferroni procedure (weighted), 252, 2031, 2181 survival, 2243 Bonferroni procedure, 252, 2031, 2181 survival, 2243 Dunnett procedure (step-up), 250, 2029 Dunnett procedure, 242 fallback, 260, 1015, 2046 survival, 2258 fixed sequence, 258, 2044 fixed sequence procedure, 593 fixed sequence survival, 1012, 2256 Hocheberg procedure, 256, 592, 2039, 2194 survival, 1010 Holm step-down procedure, 254, 2038, 2191, 2250 survival, 1007 Hommel procedure, 256, 592, 2039, 2194 survival, 1010, 2251 parametric procedures, 241 Sidak procedure, 252, 2031, 2033, 2181 survival, 2243, 2246 weighted Bonferroni, 2035 survival, 2247 Multiple Designs, 25 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index Multiple Discrete Endpoints, 601 Multiple Endpoints, 265 Multiple Linear Regression, 1847, 2552 multiple look, 94, 103, 188 paired data, 103 superiority, 103 multiple values in input field, 116 Multivariate Analysis of Variance, 1851 Multivariate statistics, 1851 Mutliple discrete endpoints gatekeeping, 601 Mutliple endpoints gatekeeping, 265 N navigator panel, 8 necessity of CDL constraint, 1170, 1180 negative symptoms Schizophrenia, 1030 new crossover data, 61 no early stopping, 1436 nominal critical point, 424 non-binding futility boundaries, 2296–2297 non-binding futility boundary, 1105, 1486 non-central t, 129, 212 Non-convergence of maximum likelihood, 2138 non-inferiority, 113, 185, 474, 751, 1901, 1926 non-inferiority and survival, 2283 non-inferiority boundaries, 1471 non-inferiority margin, 113, 1901 non-inferiority analysis, 1929, 1934, 1939 crossover, 1934, 1939 crossover design, 206, 208, 1918, 1933, 1965 exact, 752 example, 114, 122, 185, 200, 207–208, 1929, 1934, 1939 2730 Index group sequential, 188 interim monitoring, 195 multiple look, 188 one sample, 1926 paired data, 113, 121, 1901, 1903 ratio of means, 121, 208, 1903, 1929 simulation, 119, 126, 193, 205 single look, 114, 186 t test, 118, 198, 204 test of hypothesis, 113, 1901 binomial, 474, 751–752 normal, 185, 1926 power of, 2617 Non-inferiority Simulation, 884 Non-Proportional Hazards, 884, 887, 894 group sequential, 894 Single-Look, 887 Noninferiority test example binomial difference, 2088, 2093, 2097, 2100 binomial ratio, 2097 noninferiority analysis, 1901 example, 1901 Noninferiority example, 2103 noninferirority survival curves, 2228 Normal Endpoint, 79 normal endpoint, 91, 185, 211, 1890, 1913, 1926 Normal Endpoint Create Multiple Designs, 84 Fixed Sample Design, 79 Group Sequential Design, 81 Interim Monitoring, 88 Simulations, 86 normal response, 91, 141, 185, 211, 1890, 1913, 1926 NORMRAND, 70 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 nuisance parameter, 2276, 2278 Nuisance parameter elimination, 2573, 2579, 2585, 2588–2589 nuisance parameters, 2303 O O’Brien-Fleming, 94, 97, 103, 190 O’Brien-Fleming boundaries, 2291 objections against CHW statistics, 1160 observational study, 332 observational vs experimental, 332 odds Ratio of proportions, 2078 Odds ratio of proportions example, 2078 example, 2103 Odds ratio output, 2136 oncology trial, 714 single arm, 714 ONCOX trial, 1647, 1666 one-sided test, 1447 analysis, 1890 example, 91, 1891 one sample, 91, 1890 exact, 714 simulation, 98 superiority, 91, 1890, 2060 t test, 100 one sided Pampallona-Tsiatis boundaries, 2288 one sided simulation, 98 one way ANOVA, 232, 1976 operating characteristics of adaptive design, 1039 operating characteristics of adaptive group sequential design, 1044 operational issues, 1054 optimal design, 774 2731 optimal placement of last look, 2317, 2319 Options odds ratio, 2136 ordering the sample space, 2328 adjusted inference, 2328 stage-wise, 2329 Orlistat Trial, 1609 overruling a futility boundary, 1486 overruling futility boundaries, 2296 P p-value boundaries, 1462, 1466 p-value scale, 148 p-value adjusted, 427 paired data, 101, 110, 113, 121, 128, 136, 1901, 1903, 1907, 1910 analysis, 1901 equivalence, 128, 1907 example, 122, 137 group sequential, 103 interim monitoring, 106 multiple look, 103 non-inferiority, 113, 1901 ratio of means, 110, 121, 136, 1903, 1910 simulation, 105, 119, 126, 134, 139 superiority, 101 t test, 109, 118 test of hypothesis, 101, 113, 128, 1901 paired sample single look, 102, 114 paired example, 102, 114, 130, 1892, 1898, 1908 Pampallona-Tsiatis boundaries, 1469–1470, 2288 one sided, 2288 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index Pancreatic cancer Trial, 1289 parameter estimation, 1223 parameters for group sequential design, 1033 Parkinson’s disease example, 1266 patient accrual, 2271 patient follow-up time, 2280 Pearson’s Contingency Coefficient, 2598 Pearson’s Correlation, 1843 Pearson’s Product-Moment Correlation, 1844 Pearson chi-square test statistic, 394, 2300 Pearson residual, 2179 penultimate look, 2317 optimal placement, 2317 PET, 2615 phase III study, 1020 PhaseII-III Designs Theory, 2394 PI, 70 Pie Plot, 1860 PIP, 1575 binomial data, 1592 continuous outcome data, 1603 normal data, 1603 survival data, 1575 Time to Event Data, 1575 planned number of looks, 143, 478 plot conditional power, 196 general (user defined), 126, 139 power vs delta, 118, 134 power vs sample size, 118, 125, 133, 204, 215 power vs treatment effect, 118, 134 sample size vs sd of log ratio, 126, 139 Plot left cumulative, 1870 Right cumulative, 1872, 2732 Index Plots, 1854 horizontal stacked bar, 1859 PP Normal, 1867 QQ Normal, 1868 Simple bar, 1855 area, 1861 bar, 1855 box, 1863 bubble, 1864 data exploration, 1854 histogram, 1874 horizontal bar, 1857 pie, 1860 predictive interval, 1575 scatter, 1865 stacked bar, 1856 stem and leaf, 1875 step function, 1877 Pocock boundaries, 2291 Poisson, 1416 poisson endpoint, 790 Poisson endpoint, 1416, 1423 poisson one sample, 790 Poisson information based, 1416 risk ratio, 1416 pooled binomial design, 2279 pooled designs, 427 pooled estimate, 394, 2298 pooled variance, 395, 427, 2298 binomial design, 395, 427 pooled vs unpooled, 395, 427 pooled binomial, 394, 2298 Post-fit file classification table and, 2170, 2179 DEVTOX data, 2170 Seropos data, 2179 post-hoc power, 2315, 2317–2318 post-hoc power chart, 2321 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 post-hoc power calculations, 2318 chart, 2321 power, 2617 power and expected sample size conditioned on promising zone, 1045 power and sample size for the exact fixed sample test, 2605 power and sample size for the exact group sequential test, 2607 power boundaries, 1469–1470, 2348 Wang-Tsiatis, 152 Pampallona-Tsiatis, 1469–1470 Wang-Tsiatis, 400–401, 1469–1470 power boundary, 191 power chart, 154 Power of McNemar’s test, 2612 power of sequential procedure, 2318 power of the exact fixed sample test, 2606 power spending function, 2292 power vs sample size, 204 power vs sample size chart, 725, 796 power conjunctive, 248, 251, 253–255, 259–262, 586, 590, 592, 594–595, 597, 599, 1011, 1014, 1017 survival, 1005, 1007, 1009 disjunctive, 248, 251, 253–255, 259–262, 586, 589, 591, 594–595, 597, 599, 1005, 1011, 1014, 1017 survival, 1007, 1009 global, 248, 253, 586–587, 590, 592, 1005, 1011 marginal, 248, 251, 586 2733 binomial equivalence test, 2617, 2635 comparing two binomials, 2617 conditional, 2321 departure from design, 2318 non-inferiority, 2617 two binomials, 2617 unconditional, 2617 PP Normal Plot, 1867 pre-specified weights, 1056 Predictive Interval Plots, 1575 predictive interval plots, 1575 binomial data, 1592 continuous outcome data, 1603 normal data, 1603 survival data, 1575 Time to Event Data, 1575 predictive power, 177, 2324 Pregibon delta beta, 2179 preserve type-1 error, 2315 preserving alpha, 2314 probability of early termination, 2615 probability of success, 174, 981 Proc IM normal endpoint example, 1789 orlistat example, 1789 process time, 1402 Profile Likelihood Based Confidence Intervals, 2157 promising zone, 1037, 1083 Proportional hazard, 2219 Q QQ Normal Plot, 1868 R RALES trial, 1633, 1658, 1755, 1768 random number generation, 70 random numbers, 70 randomization, 395 randomization fraction, 395 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index randomization balanced, 395 unbalanced, 143, 395 range of acceptable sample sizes, 2308 range of interim outcomes for a sample size increase, 1037 ratio of means analysis, 1903, 1910, 1915, 1929, 1946 crossover design, 208, 227, 1953 equivalence, 220 example, 122, 137, 200, 208, 217, 228, 1915, 1923, 1929, 1939, 1946, 1954 non-inferiority, 1929 paired data, 110, 121, 136, 1903, 1910 simulation, 126, 139, 205, 230 single look, 111, 218, 229 superiority, 1915 t test, 204 test of hypotheses, 110, 121, 136, 199, 217, 228, 1903, 1910, 1915, 1929, 1954 ratio of proportions exact, 743 ratio of proportions exact tests, 736 RCI, 1058–1059 RCI method, 1272 re-estimating sample size for a desired power, 1364 recoding a categorical variable, 68 recoding a continuous variable, 69 recompute boundary, 2317 reconstructing a combined trial from the primary and secondary trials, 1243 recursive integration, 2352 recursive integration algorithm, 1223 2734 Index reducing the sponsor’s risk, 1029 refractory unstable angina, 23, 395 regression, 332 sample size, 332 repeated confidence interval, 1059, 2324 Jennison and Turnbull, 196 Tsiatis- Rosner and Mehta, 198 repeated confidence intervals proof of coverage, 2325 repeated measure regression example, 2006 repeated measures, 338 sample size, 338 repeated p-value, 1059 repeated significance testing, 2328 adjusted confidence interval, 2328 adjusted p-value, 2328 repeated significance tests, 2271 rescuing an underpowered on-going study, 1028 Residuals, 1847 Results window, 2177 scrolling, 2177 rho family, 192 Rho spending function, 2292 rho spending function, 2292 Right Cumulative Plot, 1872 risk ratio, 1416 Poisson, 1416 risk set, 2280 ROC Curve, 2141, 2171, 2179 ROC Curve vs Classification Table, 2151 RPV, 1059 rule for sample size adaptation, 1083 S safety boundaries, 1471 Sakoda contingency coefficient, 2598 sample-size computation, 2617 sample size, 400, 406, 409, 413, 2278 sample size adjustment, 1021 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 sample size calculation, 2607 sample size for survival studies, 2310 sample size ranges, 2310 sample size re-estimation, 1394 sample size vs information, 2303, sample size, 1027 regression, 332 repeated measures, 338 single slope, 332 two slopes, 336 maximum usable, 1082 minimum usable, 1082 re-estimation, 1027 sawtooth chart, 725 poisson, 796 Scatter Plots, 1865 Scharfstein Tsiatis Robins theorem, 2284, 2303 Schuirman’s method for log normal data, 136, 217, 1910 Schuirman’s TOST procedure, 128, 211, 222 score statistic, 2280 Scores test, 2139 Searching for Nuisance Parameters, 2582, 2587 Restricted Range, 2589 second interim analysis, 1432 second interim look, 423 SELECTIF() function, 67 selecting the criteria for an adaptive sample size increase, 1036 semiparametric information, 2272 Sensitivity, 2141 sequential design, 145 Seropositivity example, 2172 shape parameter, 2289 show table, 134 Sidak procedure, 252, 590, 2031, 2181 survival, 2243 2735 significance level, 2315 Simon’s design, 774 Simple Bar Plot, 1855 simulating preservation of type-1 error, 1168 simulation, 163, 489, 504, 518, 2345 simulation tool, 489, 504, 518 simulation crossover design, 225, 230 equivalence, 134, 139, 216, 220, 225, 230 non-inferiority, 119, 126, 193, 205 one sample, 98 one sided, 98 paired data, 105, 119, 126, 134, 139 ratio of means, 126, 139, 205, 220, 230 superiority, 98, 105 enhanced, 2345 Muller and Schafer method, 1245 single-look, 92, 186 single binomial proportion, 2605 exact design, 2605 single look design, 23, 395, 476, 1436 single look crossover design, 223, 229 equivalence, 213, 218, 223, 229 non-inferiority, 114 paired sample, 102, 114 ratio of means, 111, 218, 229 superiority, 102, 111 single mean, 91 single slope sample size, 332 SMALLT Data Set, 2111 Spearman’s Product-Moment Correlation, 1844 Spearman’s Rho, 1843 special functions, 70 special protocol assessment (SPA), 1053 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index Specificity, 2141 spending function, 97, 2291, 2348 spending function boundaries, 157, 1471, 2293, 2295 spending function Lan-DeMets, 97, 190, 103 rho family, 192 alpha, 1471 beta, 1471, 1484 gamma, 2291 Gm, 2291 Lan-DeMets, 400–401, 1418, 1471, 2291 LD(OF), 2291 LD(PK), 2291 power, 2292 recompute boundary, 2317 rho, 2292 spending functions interpolated, 156 SQEND, 70 SQNO, 70 Stacked Bar Plot, 1856 stage-wise ordering, 2329 standardized difference, 1458 statistical method:normal and binomial, 1056 Stem and Leaf Plots, 1875 step-down Dunnett procedure, 242, 250, 2028 step-up Dunnett’s procedure, 2029 step-up Dunnett procedure, 250 Step Function Plots, 1877 stop early to reject, 409, 424 stopping boundaries, 96, 147, 189, 1471 flexible, 1460 for early rejection, 409 inner wedge, 2289 meet at last look, 2288, 2294 one sided H0 or H1, 2288 one sided Pampallona-Tsiatis, 2288 2736 Index preserve alpha, 2289, 2294 preserve beta, 2289, 2294 two sided H0 or H1, 2289 two sided Pampallona-Tsiatis, 2289 upper and lower, 2288 stopping boundary at last look, 2318 stopping for futility conditional power, 2321 stopping probabilities, 95, 104, 1426–1427, 2304 Stratified Simulation, 900 stroke prevention study, 1424 stroke study information based, 1418 study design, 333, 336 study duration versus accrual chart, 2308 Subject Profile Plot, 1882 subsetting a dataset, 1943 Summary Measures, 1827–1828 central tendency, 1827 coefficient of variation, 1827 count, 1827 dispersion, 1827 geometric mean, 1827 harmonic mean, 1827 kurtosis, 1827 maximum, 1827 mean, 1827 median, 1827 minimum, 1827 mode, 1827 skewness, 1827 standard deviation, 1827 standard error of mean, 1827 sum, 1827 variance, 1827 summary of extended CDL method, 1197 Superiority Design, 865, 966 superiority, 91 Superiority, 865, 966 <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 superiority, 1890, 1913, 2060 superiority exact tests, 736 superiority trial, 23, 395 superiority analysis, 1890, 1919, 1923 binomial, 2060 crossover, 1919, 1923 Superiority Drop-Outs, 971 superiority example, 91, 102, 1890, 1894, 1915, 1919, 1923 Superiority Fixed Accrual Duration, 966 Fixed Study Duration, 966 Given Accrual Duration and Study Duration, 966 superiority group sequential, 94, 103 interim monitoring, 99, 106 multiple look, 93–94, 103 normal, 91, 1890, 1913 one sample, 91, 1890, 1913, 2060 paired data, 101, 110 ratio of means, 110, 1915, 1929 simulation, 98, 105 single look, 92, 102, 111 t test, 100, 109, 112 test of hypothesis, 91, 101, 1890 Superiority Drop-Outs, 872 Non-Constant accrual, 874, 972 Piecewise Constant Hazard, 876 Simulation, 877, 882, 972–973, 1119, 1183 Simulation with fixed study duration, 882–883, 973–974 Variable accrual, 874, 972 survival, 1014–1015, 1017, 2305 survival and non-inferiority, 2283 2737 survival endpoint, 820 Survival Endpoint, 826 Compare Multiple Designs, 844 Fixed Sample Design, 826 Group Sequential Design, 830 Interim Monitoring, 855 R Integration, 859 Simulations, 852 survival endpoint: Lung Cancer Trial, 1112 Survival Endpoint: Pancreatic cancer Trial, 1289 Survival Simulation, 877, 946, 972, 992, 1119, 1183 survival simulations, 821 survival studies, 2308 enrollment range, 2308 sample size, 2310 sample size range, 2308 survival choice of variance, 2310 expected stopping time, 2305 single look, 1436 survival:statistical method, 1071 T t-test, 1837 paired samples, 1838 independent samples, 1837 t test, 198, 204 non-inferiority, 118, 198, 204 one sample, 100 paired data, 109, 118 ratio of means, 204 superiority, 100, 109, 112 target conditional power, 1082 ten-look boundaries, 2292 inverted, 2292 test for mean one sample, 91, 1890 test of hypotheses Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index ratio of means, 110, 199, 217, 228, 1915, 1929, 1954 test of hypothesis non-inferiority, 113, 1901 paired data, 101, 113, 1901 superiority, 101 test statistic, 400, 411, 413, 422, 424 test statistic calculator, 2332 by-passing, 2332 the problem of overruns, 1034 theory, 2274, 2614 theory of interim monitoring, 2313 theory McNemar’s conditional exact test, 2611 binomial response designs, 2276 exact test, 2611 normal response design, 2274 paired binomial, 2611 Simon’s design, 2614 Simon’s minimax design, 2614 Simon’s test, 2614 Simon’s two-stage optimal design, 2614 time-to-event response designs, 2280 two binomials, 2617 third interim look, 425 time-to-event trials, 820 time, 2280 time points, 155 time to event outcomes, 2219, 2228 time calendar, 2280 failure, 2280 patient follow-up, 2280 TOST procedure, 128, 211, 222 transform variable, 63 Treatment by Period Plot, 1884 trial design, 23, 395, 1416 trial simulation, 489, 504, 518 2738 Index triangular continuation region, 2288, 2294 Tschuprov contingency coefficient, 2598 Tsiatis and Mehta, 1220 Tutorial HIV data, 2132 two-sided H0-H1 boundaries, 2296 two-sided test, 1445 two-sided tests asymmetric, 1481, 1484 Two-stage multi-arm design Continuous Endpoint, 309 Discrete Endpoint, 621 Two-stage treatment selection design Continuous Endpoint, 309 Discrete Endpoint, 621 two binomials, 1394 Two binomials equivalence test, 2574 exact one-sided p-value, 2578, 2587 non-inferiority test, 2571, 2583 unconditional exact test, 2570 two binomials equivalence testing, 2617, 2635 Two binomials Fisher’s exact test, 2597 two binomials Fisher’s exact test, 2617 power, 2617 unknown baseline, 1394 two independent binomials, 2617 equivalence testing, 2617 Fisher’s exact test, 2617 power, 2617 two one sided tests, 211, 222 Two Ordered Multinomials - WilcoxonMann-Whitney, 2117 two sample exact test, 751 two sample <<< Contents * Index >>> R East 6.4 c Cytel Inc.Copyright 2016 non-inferiority, 1926 superiority, 1913 two sided Pampallona-Tsiatis boundaries, 2289 two slopes sample size, 336 two way ANOVA, 237, 1985 type-1 error preserved, 2315 U unbalanced data, 2029 unbalanced randomization, 143, 395 unconditional type-1 error, 1261 underlying theory for extension of CDL method, 1192 underpowered studies, 1027 unequally spaced analysis, 155 unfavorable zone, 1037 Uniform random numbers, 70 unknown binomial rate, 1401 unknown variance, 1409 unplanned analyses, 2314 unpooled estimate, 394, 2298 unpooled variance, 395, 427 binomial design, 395, 427 unpooled vs pooled, 395, 427 unpooled binomial, 394, 2298 unweighted Wald statistic, 1058 upper and lower stopping boundaries, 422, 424 upper limit to the sample size increase, 1038 upper stopping boundary, 2288, 2294 V VACCINE data set, 2097 VARI data set, 1870 variable, 57 variable follow-up designs, 2307–2308 variable transform, 63 2739 variable types, 57 variable binary, 57 categorical, 57 integer, 57 numeric, 57 string, 57 variance, 2310 variance in survival studies, 2310 variance null or alternative, 2310 pooled, 578 unpooled, 578 W , Wald statistic, 1058 weighted, 1058 Wald test, 1847, 2139 Wang-Tsiatis boundaries, 1469–1470, 2287 Wang-Tsiatis power boundaries, 152, 400–401 Wang-Tsiatis power boundary, 191 Weighted Bonferroni, 2188 weighted Bonferroni procedure, 252, 588, 2031, 2181 survival, 2243 Weighted Bonferroni Analysis, 2188 weighted Wald statistic, 1058 weights: pre-specified or actual, 1160 Wilcoxon-Mann-Whitney test, 179, 1956 example, 180 Wilcoxon-Mann-Whitney Two Ordered Multinomials, 2117 Wilcoxon scores, 2599 Wilcoxon Signed Rank Test analysis, 1898 Pages Vol 1: 1–70; Vol 2: 71–341; Vol 3: 342–707; Vol 4: 708–783; Vol 5: 784–818; Vol 6: 819–1018; Vol 7: 1019–1386; Vol 8: 1387–1794; Vol 9: 1795–2266; Vol 10: 2267–2740 <<< Contents * Index >>> Index example, 1898 Z zoom, 37 2740 Index
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : Yes Author : Create Date : 2016:05:22 15:13:36-04:00 Modify Date : 2018:03:25 20:27:21-04:00 PTEX Fullbanner : This is MiKTeX-pdfTeX 2.9.4902 (1.40.14) Subject : XMP Toolkit : Adobe XMP Core 5.6-c015 84.159810, 2016/09/10-02:41:30 Format : application/pdf Creator : Description : Title : Creator Tool : Amy Hendrickson, TeXnology Inc., http:www.texnology.com, amyh@texnology.com Metadata Date : 2018:03:25 20:27:21-04:00 Keywords : Producer : pdfTeX-1.40.14 Trapped : False PTEX Fullbanner : This is MiKTeX-pdfTeX 2.9.4902 (1.40.14) Document ID : uuid:5e6615d2-cc44-48e1-b045-d6a21502ade7 Instance ID : uuid:b392ed72-5d49-45eb-9da8-2de682763eea Page Mode : UseOutlines Page Count : 2767EXIF Metadata provided by EXIF.tools