
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Imaging, Diagnosis, Prognosis |
Authors' Affiliations: Departments of 1 Breast Medical Oncology and 2 Biostatistics and Applied Mathematics, the University of Texas M. D. Anderson Cancer Center, Houston, Texas
Requests for reprints: Lajos Pusztai, Department of Breast Medical Oncology, The University of Texas M. D. Anderson Cancer Center, Unit 1354, P.O. Box 301439, Houston, TX 77230-1439. Phone: 713-792-2817; Fax: 713-794-4385; E-mail: lpusztai{at}mdanderson.org.
| Abstract |
|---|
|
|
|---|
Experimental Design: Gene expression data from 132 newly diagnosed breast cancers were used to simulate 50,000 single-agent phase II trastuzumab studies. True HER-2 amplification was assessed by fluorescence in situ hybridization.
Results: Only 3.67% of the simulated studies yielded HER-2 as the top predictor, >96% of the individual "studies" picked a different gene as the most predictive of trastuzumab response. HER-2 was included in the top 10 gene list 9.73% of the time. When HER-2 was a priori defined as a potential predictor, 99.6% of the simulated studies confirmed overexpression among responders. Candidate marker testing may be more efficient than de novo predictor discovery in phase II trials. We describe a tandem, two-step phase II trial design for rapid marker assessment that combines two optimal two-stage phase II trials into a single study. In the first stage, unselected patients are treated, and if insufficient responses are seen, the trial remains open for marker-positive patients only and a second two-stage trial commences.
Conclusions: The probability of successful discovery of drug-specific pharmacogenomic response markers in a typical phase II study is small. The evaluation of predefined predictors using tandem two-step phase II design has the advantages of estimating response rates in both unselected and marker-selected patient populations and allows for simultaneous screening of multiple different predictors for the same drug and several distinct predictor-drug pairs in a single, parallel multiarm trial.
ER-negative, high-grade breast cancers are more sensitive to many different types of chemotherapies compared with ER-positive and low-grade tumors. Comparison of transcriptional profiles of tumors that responded to chemotherapy with those that did not could reveal many differentially expressed genes. However, most of these genes will reflect the gene expression differences that underlie the phenotypic differences between the two response groups. The resulting pharmacogenomic response predictor may represent, to a large extent, a predictor of phenotype (i.e., high-grade, ER-negative cancers versus low-grade ER-positive cancers). Several predictive gene signatures have been reported, but currently, it remains unknown to what extent these signatures include genes that are predictive to a particular drug as opposed to being dominated by phenotype-associated genes that are predictive of general chemotherapy sensitivity. It is possible to adjust for or stratify cases by known clinical variables during the marker selection process but the limited sample size often makes this difficult to do.
The probability that the supervised pharmacogenomic discovery approach, when responders are compared with nonresponders, could lead to regimen-specific predictors depends on (a) to what extent the response groups are balanced for strong phenotypic markers, (b) and the extent of molecular differences that determine drug-specific response. If drug sensitivity is influenced by the 2- to 3-fold higher or lower expression of a few dozen genes, these differences might not be readily discovered through supervised pharmacogenomic analysis of data from a typical phase II trial including 30 to 60 patients. These modest gene expression differences between responders and nonresponders can easily be masked by the larger scale molecular differences due to any phenotypic imbalance between the response groups. The technical noise of microarray experiments can also obscure small-scale gene expression differences. A typical gene chip includes 15,000 to 25,000 probe sets, and even with a high level of reproducibility, hundreds of genes could show several fold differences in expression values in simple replicate experiments (i.e., the same RNA profiled twice). For example, when 24,000 measurements are done (e.g., Affymetrix U133A gene chip) and the overall concordance is 97.98% in a technical replicate; 1.31% of all measurements could have
2-fold variation. This means that 314 genes could be
2-fold decreased or increased from one experiment to another due to technical noise alone (10).
In this article, we examined if we could have discovered HER-2 mRNA overexpression as single gene predictor of response to trastuzumab through supervised analysis of pharmacogenomic data from simulated phase II trials using real breast cancer gene expression data. Our results suggest that the probability of successful drug-specific pharmacogenomic response marker discovery from a typical phase II study can be small. We suggest that prospective testing of a priori defined candidate markers may be more efficient than de novo predictor discovery in phase II trials. We describe a tandem, two-step phase II clinical trial design for rapid assessment of candidate response predictors. The design combines two classic optimal two-stage phase II trials into a single study and can estimate response rates in both unselected and marker-selected patient populations. It also allows for simultaneous screening of multiple different predictors for the same drug and several distinct predictor-drug pairs in a single, parallel multiarm trial.
| Patients and Methods |
|---|
|
|
|---|
30% to 35% of HER-2–amplified cases respond to single-agent trastuzumab therapy and HER-2–nonamplified breast cancers do not respond to this treatment (11, 12). Currently, there are two clinically routine methods to select patients for trastuzumab therapy; these include detection of HER-2 protein overexpression with immunohistochemistry and HER-2 gene amplification assessed by in situ hybridization methods. We randomly selected 45 HER-2 normal (non–gene-amplified) and 15 HER-2 gene–amplified cases based on real fluorescence in situ hybridization results to simulate a 60-patient phase II study population. Five (33%) of the 15 HER-2–amplified cases were randomly assigned "responder" category because true response was unavailable. The remaining 10 HER-2–amplified cases, together with the 45 HER-2 normal cases were considered "nonresponders", this corresponds to an overall response rate of 8.3% (5 of 60) for the whole study population and 33% for the HER-2–amplified population. The gene expression profiles of the two groups were compared using unequal variance t test to identify differentially expressed genes. This is one of the most commonly used approaches in the literature to identify informative genes for predictive marker discovery. We did this analysis 50,000 times, randomly picking different sets of cases from the larger patient pool of 132, and randomly assigning 1/3 of the HER-2–amplified cases to the "responder" category. The goal was to examine how often HER-2 was ranked by its P value as the most differentially expressed gene in these 50,000 iterations. Each iteration could be considered as a single 60-patient clinical trial and the analysis follows the commonly used supervised analysis to discover molecular predictors of response from pharmacogenomic data. A more complex clinical trial modeling process could have been designed that allows for variable response rate in each simulated study and where the response rates follow normal distribution with a mean at 35%. However, this more realistic modeling would have made it even less likely for HER-2 to be identified as a predictor of response in more than a small percentage of individual studies. Our fixed response rate biases the simulations towards higher power to detect HER-2 overexpression as a marker of response. We also examined a complementary scenario. Preclinical data is usually available to propose potential predictors; the amount of the drug target itself or measures of its functional activity are among the two most obvious candidates. Therefore, we tested the hypothesis that increased HER-2 mRNA expression is a marker of trastuzumab response based on results from preclinical studies. We did the same unequal variance two-sample t test on the transcriptional profile data to test the hypothesis that HER-2 mRNA expression is higher in responders than in nonresponders. The probe closest to the 3' end (216836_s_at) was selected to represent HER-2 mRNA expression (13). Because in this analysis, we test a single hypothesis and a single gene, we considered P < 0.05 to be statistically significant. We plotted how often HER-2 expression was significantly higher in responders in the 50,000 iterations.
| Results |
|---|
|
|
|---|
2, P = 0.022). These results suggest that it is rather unlikely that HER-2 could have been discovered as a single gene predictor of trastuzumab response from supervised pharmacogenomic analysis of a single phase II clinical trial. In fact, there is an
80% chance that it would not have been included in the top 50 gene list of any individual study.
|
|
|
Tandem, two-step phase II trial design for predictive marker evaluation. These observations have important consequences for the design of clinical trials. The results show that the use of broad pharmacogenomic screening to identify molecular predictors for drugs that show low overall response rate (8-10%) is a high-risk strategy for marker discovery in a typical phase II study. Using this approach, we could not identify the only currently known single gene predictor of response to trastuzumab, HER-2 overexpression. Because there is no known gene signature that predicts response to this drug, we could not directly test whether a multigene signature could have been detected by our simulation studies. However, because we could not identify the most informative single gene, it is rather unlikely that a statistically robust multigene signature could have been identified.
Our results also show that it is a more productive strategy to prospectively test an a priori defined predictor in pharmacogenomic data obtained during a phase II study. Enough is known about the mechanism of action of most drugs that one could rationally propose at least one or more potential response predictors. These could include the expression levels of a single gene, complex gene signatures, or any other molecular measurement (14–16). One promising strategy is to identify predictors in cell line models (17). How to best define the response predictor based on the preclinical data or using results from archived specimens will vary from case-to-case and is not the subject of this article. The predictor could be a single gene or a complex gene signature and could be measured at the mRNA or protein levels. However, the predictor must be fully defined, including cutoff values for positivity and negativity prior to evaluating its value in a prospective clinical trial. Conceptually, testing a rationally designed response predictor in a prospective clinical trial is no different from testing a candidate drug in a therapeutic study.
The two-stage phase II trial design has been used for several decades to identify drugs with promising clinical activity and quickly discard those with low activity. The goal of the phase II clinical trials is to determine whether a new drug has enough clinical activity to warrant larger scale evaluation. During the first stage of a classic two-stage phase II study, "n1" number of patients are entered into the trial and if fewer than "r1" number of responses are observed, the accrual terminates for lack of activity. Otherwise, accrual continues to a total of "n" evaluable patients. At the end of this second stage, the drug is recommended for further evaluation if the final response rate is "
r" (18). In order to calculate sample size, investigators must first specify a drug activity level of interest and probability variables for early stopping. The design can be easily modified to include interim efficacy monitoring using a Bayesian approach (19).
Similar phase II trial designs could be applied to prospectively evaluate putative response markers. Assume that a drug has completed phase I evaluation and a dose was selected for phase II testing, and also at least one, but preferably more, putative predictive markers are available but the response rate in unselected patients is still unknown. A tandem, two-stage, phase II clinical trial design could be applied to test the drug and the predictors simultaneously. The goal of the study is to determine if the drug is likely to have a certain level of activity in unselected patients, and if it is below the level of interest, can a particular patient selection method enrich the responding population to meet the targeted level of activity in the molecularly selected group.
The concept of this design is illustrated in Fig. 3 . The study starts out as a two-stage phase II trial for unselected patients with an early stopping rule for futility. If sufficient numbers of events (e.g., clinical benefit rate or response rate) are observed during the first stage, the study proceeds to the second stage to establish the benefit rate more precisely in unselected patients. However, if an insufficient number of events are seen during the initial stage, instead of stopping accrual for futility, the trial remains open for response marker–positive patients only and a second optimal two-stage trial commences. This second stage is introduced because it is very unlikely that the small group of patients who participated in the first phase (typically n1 < 25) included a sufficient number of marker-positive cases to draw a conclusion about the activity of the drug in this molecularly defined subset. If an insufficient number of events occur after accruing "n2" number of marker-positive cases during the second step of the study, the trial is discontinued following the early sopping rules and the marker is rejected. Otherwise, the study proceeds to complete accrual of additional marker-positive patients in order to estimate the benefit rate more precisely.
|
|
|
If there are multiple, nonoverlapping candidate response markers for the same drug, all of these could be tested simultaneously in a single, parallel, multiarm study. Accrual to each marker arm could occur simultaneously but results are analyzed separately for each arm. If the predictors capture overlapping patient populations (i.e., the same patient is positive for several of the markers), more complex adoptive randomization designs could be applied to preferentially randomize patients into better performing marker arms. It is expected that only a small fraction of the total patient population will be positive for any one of the markers because the sensitive population is small (assuming that the marker is reasonably sensitive and specific). Therefore, this design implies that a large number of patients will be screened during the second step of the study to find marker-positive individuals who are eligible for therapy. The exact number needed to screen will depend on marker prevalence. To maximize eligibility for treatment among the screened patients, it is desirable to test multiple different predictors simultaneously on each case. Patients who are positive for a particular marker will receive treatment in parallel treatment arms. Several different drug and marker pairs could also be evaluated simultaneously in a single study.
| Discussion |
|---|
|
|
|---|
Technically, the two steps of the study could be separated and run as independent studies, one for unselected patients and another for marker-positive patients. However, there are several reasons why keeping them together might yield a more seamless trial. Running two separate studies takes more time and is more expensive. Because any particular predictor will define only a relatively small subset of patients who are eligible for therapy, it is appealing to simultaneously test several predictors so that fewer patients are turned away as marker negative and therefore ineligible for treatment. Multiple distinct predictors for the same drug as well as several different drug and predictor pairs could be tested simultaneously in a single, parallel, multiarm trial.
The study design that we propose evaluates the candidate response marker based on its positive predictive value (e.g., how often response is seen among marker-positive cases). A high enough positive predictive value is necessary for clinical utility, but alone, it is not sufficient to make a marker clinically useful. Sensitivity also has to be considered and that it is influenced by false negative cases (i.e., patients who respond but are marker-negative). A crude estimation of sensitivity may be made by using information from the first step of the tandem design when unselected patients are included, but to define the sensitivity and specificity of the marker more precisely, a separate study is needed. However, response markers with low positive predictive values are clinically not useful and need not be investigated further.
Any clinical study that prospectively evaluates patient selection methods will require a tissue biopsy. Early single-agent phase II studies are often conducted in the metastatic patient population, and therefore, deep tissue and organ biopsies might be required for marker evaluation. Invasive procedures to obtain biopsies for correlative science studies have traditionally been avoided for considerations of patient discomfort, fear of adverse events, and cost. However, serious complication rates from fine-needle aspirations of abdominal organs or body cavity lymph nodes are substantially lower than serious, grade 3 to 4 treatment-related adverse events during most investigational chemotherapy trials. For example, a study described complication rates encountered during 10,766 ultrasonographically guided abdominal fine-needle aspirations and reported 0.18% (n = 22) major complications including peritoneal bleeding and 0.018% death rate (n = 2; ref. 20). Well-informed patients may elect to take these risks to participate in studies that evaluate the value of personalizing treatment.
There are other clinical trial strategies to prospectively discover and validate molecular predictors of response. An adoptive clinical trial design was recently described that incorporates pharmacogenomic predictor discovery and validation into a traditional randomized phase III study (21). This design includes (a) identification of the sensitive patients during the initial phase of the phase III trial (i.e., predictor discovery), (b) a statistical test for treatment effect for the marker-positive patients that are accrued in the remainder of the trial (i.e., validation set), and (c) a properly powered statistical test for overall treatment effect using results from all randomized patients (i.e., traditional phase III end point). This is an appealing strategy because it incorporates marker discovery and validation into a single randomized study without compromising the ability to detect an overall treatment effect by traditional criteria. Unfortunately, it requires a large randomized trial. The current article describes an alternative strategy that can be incorporated into phase II testing of novel drugs.
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
3 http://bioinformatics.mdanderson.org/pubdata.html ![]()
Received 4/ 6/07; revised 6/13/07; accepted 7/ 9/07.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L. M. McShane, S. Hunsberger, and A. A. Adjei Effective Incorporation of Biomarkers into Phase II Trials Clin. Cancer Res., March 15, 2009; 15(6): 1898 - 1905. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Andre, B. Job, P. Dessen, A. Tordai, S. Michiels, C. Liedtke, C. Richon, K. Yan, B. Wang, G. Vassal, et al. Molecular Characterization of Breast Cancer with High-Resolution Oligonucleotide Comparative Genomic Hybridization Array Clin. Cancer Res., January 15, 2009; 15(2): 441 - 451. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Simon The Use of Genomics in Clinical Trial Design Clin. Cancer Res., October 1, 2008; 14(19): 5984 - 5993. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |