## Abstract

Among various design approaches to phase III clinical trials with a predictive biomarker, the marker-stratified all-comers design is advantageous because it allows for establishing the utility of both treatment and biomarker, but it is often criticized for requiring large sample sizes, as the design includes both marker-positive and marker-negative patients. In this article, we propose a simple but flexible subgroup-focused design for marker-stratified trials that allow both sequential assessment across marker-defined subgroups and adaptive subgroup selection while retaining an assessment using the entire patient cohort at the final analysis stage, possibly using established marker-based multiple testing procedures. Numerical evaluations indicate that the proposed marker-stratified design has a robustness property in preserving statistical power for detecting various profiles of treatment effects across the subgroups while effectively reducing the number of randomized patients in the marker-negative subgroup with presumably limited treatment efficacy. In contrast, the traditional all-comers and sequential enrichment designs could suffer from low statistical power for some possible profiles of treatment effects. The latter also needs long study durations and a large number of marker-screened patients. We also provide an application to SWOG S0819, a trial to assess the role of cetuximab in treating non–small cell lung cancers. These evaluations indicate that the proposed subgroup-focused approach can enhance the efficiency of the marker-stratified design for definitive evaluation of treatment and biomarker in phase III clinical trials. *Clin Cancer Res; 24(5); 994–1001. ©2017 AACR*.

## Introduction

With the establishment of the molecular heterogeneity of histologically defined cancers, clinical development of cancer treatments has been faced with a paradigm shift toward precision medicine, with codevelopment of molecularly targeted agents and companion predictive biomarkers. The gold standard of clinical development, however, will continue to be based on appropriately designed prospective, randomized clinical trials (1). In recent years, streamlining earlier exploratory phases of the codevelopment of treatment and biomarker, particularly, through establishing platform trials (e.g., ref. 2), has attracted much attention. Meanwhile, confirmatory phase III trials to establish the clinical utility of new treatments with the aid of biomarkers remain critical. Because the process needed for marker development and validation is complex, candidate markers may have differing levels of credibility at the design stage of phase III trials.

In cases where biological evidence strongly suggests that treatment efficacy is limited to marker-positive patients, enrichment designs (which randomize only these patients) may be the best approach from both efficiency and ethical perspectives (1–9). However, in many cases, imperfections in measuring the molecular target, possible alternative cutoff points that could better define marker positives (especially for graded or continuous markers), and possible off-target effects of the treatment would not permit us to rule out the possibility that at least some of the remaining marker-negative patients could also benefit from the treatment (4). The major limitation of the enrichment design is that it does not provide data to evaluate treatment efficacy in marker-negative patients.

To overcome this issue in enrichment trials, some authors have proposed a sequential enrichment approach that conducts a second trial for marker-negative patients when an initial enrichment trial for marker-positive patients demonstrates treatment efficacy (9). It is believed that this approach allows for quicker evaluation of the treatment for the patient population that is considered most likely to benefit from the treatment. However, this tandem approach may require long periods of clinical development as a whole, and also it might be hard to do the second trial given the presumption in the first trial that the marker-negative patients might not benefit.

Another approach is the concurrent evaluation of both marker-positive and marker-negative patients using a marker-stratified, randomize-all or all-comers design. In this approach, multiple testing procedures to evaluate treatment efficacy in marker-defined subgroups and the overall population have been established (10–14). A major advantage of this approach is that it permits establishing the utility of both treatment and marker. However, the limitation is the involvement of marker-negative patients and the resultant large sample sizes. In particular, exposure of a large number of marker-negative patients who are unlikely to benefit to the tested treatment could be an ethical concern.

An interim analysis for adaptive subgroup selection with curtailment of patients' enrollment in a subgroup can reduce the size of marker-stratified all-comers trials (15–18). However, for time-to-event outcomes typically evaluated in the primary analysis of phase III oncology trials, we would have a specific difficulty in assuring the independence from any subsidiary information from censored patients at the time of the interim analysis (16, 19). This may require prespecification of the adaptation rules based only on the primary test statistic, which is essentially equivalent to the traditional group sequential analysis.

For a marker-stratified trial, superiority and futility stopping rules have been considered for marker-defined subpopulations or the overall population (8, 20). However, the inclusive relationship between these populations could complicate the sequential assessment. For example, in a coprimary analysis to test treatment efficacy in both overall and marker-positive populations, we can have early stopping for superiority in marker positives but no early stopping for futility in marker negatives. We may have to adjust for the significance level of the overall test at the final analysis depending on the status of early stopping in marker positives. In case of early futility stopping in marker negatives but no superiority stopping in marker positives, it would no longer be valid to conduct the overall test at the final analysis. If the overall test is abandoned in this case, an additional adjustment for the test in marker positives at the final analysis may or may not be considered. A simple approach to avoid these complexities is to close the entire study at the time of superiority stopping in the marker-positive subgroup (12).

In this article, we propose a simple but flexible subgroup-focused design for marker-stratified trials that allows for both sequential assessment across the subgroups and adaptive subgroup selection while retaining an assessment using the entire patient data at the final analysis stage, possibly using established marker-based multiple testing procedures.

## Methods

We consider a phase III trial design based on a dichotomized predictive marker to compare a new molecularly targeted treatment with a standard treatment based on time-to-event outcomes. We assume a reliable marker hypothesis where the treatment is more effective in the marker-positive than in the marker-negative patients. One-sided statistical tests are used.

The proposed design approach is summarized in Fig. 1. This can be viewed as concurrent subgroup-focused trials with a futility stopping rule in the marker-negative subgroup and a superiority stopping rule in the marker-positive subgroup. In case I, both boundaries are crossed, and the trial is stopped with a conclusion of efficacy in the marker-positive subgroup. In case II, only the superiority boundary is crossed, and there is sequential testing in the marker-negative subgroup. In cases III and IV, the marker-positive subgroup or the overall population is adaptively selected for the final analysis depending on whether the futility boundary is crossed in the marker negatives. In case IV, the subgroup data are combined for the final analysis. Thus, the possible complexities in performing an overall test at the final analysis in case of early stopping in some subgroup is avoided by restricting the implementation of the analysis using all patient data to only the case with no early stopping in both subgroups. Extension to multiple interim looks is possible, but we suppose a single interim analysis within subgroups for ease of presentation and practical application.

The interim analysis for superiority in the marker-positive patients, deemed most likely to benefit from the treatment, is to detect substantially large treatment effects and to quickly deliver the treatment to such patients. Although futility stopping rules can also be introduced in this subgroup (12), we propose no specification of such rules and no adjustment on the final analysis. In any case, futility stopping for marker positives would lead to the termination of the trial under the marker hypothesis (8, 19).

On the other hand, for marker-negative patients, a futility stopping rule would be warranted from an ethical perspective due to presumably limited treatment efficacy in marker negatives under the marker hypothesis (3). We propose a monitoring plan that accounts for the two possible errors: (i) futility stopping even when treatment has, in truth, a minimum effect size of clinical importance and (ii) continuing the trial for the marker negatives even when there is no treatment efficacy. In addition, we could introduce a superiority stopping rule, but we do not consider this option because large treatment effects are generally implausible for marker negatives under the marker hypothesis.

When there is not sufficient evidence for early stopping in both subgroups (case IV in Fig. 1), an overall test is a simple but most effective choice in detecting an average treatment effect in the overall population at the final analysis. Alternatively, when the marker hypothesis is deemed strong, hierarchical tests may be used, such as a fixed-sequence procedure that first tests treatment efficacy in the marker positives, followed by testing in the marker negatives if the first test is significant (3, 8, 10). Otherwise, a split-alpha procedure that allocates the alpha to be spent between a test in the marker-positive subgroup and one in the overall population may be a reasonable choice (3, 8, 10, 12).

The significance levels of all statistical tests are determined to preserve a study-wise alpha level of 0.025 based on the joint null distribution of the test statistics for the marker-positive and marker-negative subgroups and the overall population across different analysis stages, that is, the global null hypothesis (see Supplementary Material S1). We do not consider an alpha control under another possible null hypothesis, where the treatment is efficacious in marker positives, but not in marker negatives (see Supplementary Material S1 for further discussion).

### Subgroup-focused design

Although marker-stratified trials can be designed as a random patient sampling from the general population without regard to the marker, like the traditional all-comers design, separate designs for subgroups can clarify the role of each subgroup-focused analysis in our approach. This also facilitates comparison with the enrichment and sequential enrichment designs with separate subgroup evaluation. The values of all the design parameters could be further tuned on the basis of the operating characteristics of the trial as a whole. In this design, each subgroup is sized and monitored independently (see Results and Supplementary Material S2 for an example). See Table 1 for a glossary of mathematical symbols used in this section.

The marker-positive cohort is designed as if it were an enrichment trial. This is sized for large but slightly conservative effects for the new treatment. In terms of HRs relative to the control treatment, HR^{(+)} of 0.55–0.70 is typically specified in determining sample sizes in enrichment trials (21–28). For example, with HR^{(+)} = 0.65, approximately *E*_{2}^{(+)} = 227 events are needed to preserve a power of 0.9 of a log-rank test with alpha of 0.025 in the marker positives (without regard to the analysis in the marker negatives). Regarding the interim analysis for superiority stopping, again, we aim to detect larger effects, such as HR^{(+)} of 0.3–0.6, as observed in many enrichment trials (21–27). For example, to detect HR^{(+)} = 0.5 with a power of 0.9 in a log-rank test with reduced significance level *α*_{1}^{(+)} = 0.001, we consider a rule for superiority stopping if the standardized test statistic *Z*_{1}^{(+)} ≤ *c*_{1}^{(+)} with *c*_{1}^{(+)} = –3.090, when *E*_{1}^{(+)} = 160 events (of the maximum *E*_{2}^{(+)} = 227) is observed. As such, the marker-stratified trial, in effect, incorporates an enrichment trial, and enrollment of additional marker-positive patients may be unnecessary in the case of futility stopping in the marker-negative patients.

The marker-negative cohort is designed as if it were a second trial in the sequential enrichment approach. This is because the chance to evaluate this cohort solely when the treatment effect is significant in marker-positive patients is also embedded in our approach, not sequentially, but concurrently (case II in Fig. 1). Provided that treatment efficacy is demonstrated in marker positives, a larger treatment effect in marker negatives, say HR^{(–)} = 0.70, than that typically set in traditional designs without regard to the marker to detect overall treatment efficacy, say HR = 0.75, could be reasonably specified. This may yield approximately *E*_{2}^{(–)} = 331 events for power of 0.9 and alpha of 0.025. We should note that the maximum number of events in the trial is 558 (= 227 + 331), which is only about a 10% increase from 508 calculated under HR = 0.75 typically set in the traditional designs.

For the futility stopping rule for marker-negative patients, we calculate a Bayesian posterior probability (29, 30) that the treatment has at least a minimum effect size of clinical importance, say HR^{(–)} = 0.8, or the true log HR < *δ*, where *δ* = log(0.8). The motivation behind the use of the Bayesian posterior probability on the HR or effect size (rather than a standardized test statistic) is mainly from the ease of interpretation for clinicians in considering the possibility of futility stopping when there is a minimum effect size of clinical importance in marker negatives. With a noninformative prior on effect size, a futility stopping rule when this posterior probability is less than a small value, γ = 0.1–0.3, translates to
for an estimated log HR , where *V*_{1}^{(–)} = 4/*E*_{1}^{(–)} and *E*_{1}^{(–)} is the observed number of events at the interim analysis, and *z _{γ}* is a quantity such that the probability of being greater than

*z*(upper probability) in the standard normal distribution is γ [see Table 1 for derivation of Equation (Eq.) A]. That is, if holds, a futility stopping would be suggested for the marker-negative subgroup. In considering the boundary

_{γ}*c*

_{1}

^{(–)}, we also incorporate the probability of continuing the trial for marker negatives even when there is no treatment effect, that is, HR

^{(–)}= 1.0, denoted by

*ψ*. Table 2 summarizes these probabilities for various values of

*E*

_{1}

^{(–)}. As is expected, the two posterior probabilities, γ and

*ψ*, are in a trade-off. We generally recommend taking a balance between them. For example, with

*ψ*= γ = 0.25, we plan to use a futility stopping rule, , with

*c*

_{1}

^{(–)}= −0.113 (0.89 in HR), when

*E*

_{1}

^{(–)}= 150. If a more aggressive futility stopping is preferred, for example, under situations with strong confidence on the biomarker, toxic treatments, and so on, a larger value for γ (or smaller value for

*ψ*) might be considered.

As such, the subgroup-specific design explicitly protects randomized patients from unnecessary follow-up within subgroups, allows earlier decisions on treatment change, and also reduces the number of randomized patients and shortens study duration.

## Results

We conducted simulation studies to compare the various design approaches in terms of the power, study duration, and numbers of patients randomized and screened for the marker. The aforementioned subgroup-focused designs provided the basis for a fair comparison. We considered an enrichment trial with the maximum number of events *E*_{2}^{(+)} (= 227) and an interim test for superiority using significance level *α*_{1}^{(+)} = 0.001 when *E*_{1}^{(+)} (= 160) events were observed. In the sequential enrichment approach, a second trial with the maximum number of events of *E*_{2}^{(–)} (= 331) for marker negatives commenced immediately when the first enrichment trial demonstrated superiority of the treatment. For the marker negatives, the futility stopping rule (Eq. A) with *c*_{1}^{(–)} = –0.113 (γ = *ψ* = 0.25) was used when *E*_{2}^{(–)} (= 150) events were observed. We also considered a fixed marker-stratified design that performed the fixed-sequence or split-alpha procedures for *E*_{2}^{(+)} and *E*_{2}^{(–)} events from the marker-positive and marker-negative subgroups, respectively, only at the final analysis. In the proposed marker-stratified design, we also applied these tests for case IV. In the split-alpha procedure, 60% of the alpha was allocated to the overall test. For a reference, we also considered a traditional all-comers design without the marker that collected the maximum number of events of *E*_{2}^{(+)} + *E*_{2}^{(–)} (=558) but involved an interim superiority analysis with significance level 0.001 when half (229) of the events were observed. We defined the power as the probability of obtaining a statistically significant result for any hypotheses on treatment efficacy in any analysis stage. This corresponds to the probability of asserting treatment efficacy at least for the marker-positive subpopulation.

In simulating clinical trial data, we assumed an accrual rate of 200 patients per year. We considered the marker prevalence in the general population of 0.4. The lengths of patient accrual and follow-up periods were determined to collect the targeted maximum numbers of events (see Supplementary Material S2). We set the length of the accrual period as 4 years for the enrichment and marker-stratified designs, 3.5 years for the second trial in marker negatives in the sequential enrichment design, and 3 years for the traditional all-comers design. We set a follow-up period with 6 months for all the designs. We assumed an exponential distribution with a median survival time of 6 months under the control treatment, supposing an evaluation of progression-free survival (PFS) for advanced diseases (31). We assumed various profiles of treatment effects across subgroups, including qualitative and quantitative treatment-by-marker interactions under the marker hypothesis (see Table 3). All analyses on treatment efficacy (at the interim or final stages) were performed when specified numbers of events were observed, so that the study duration varied across simulations. We assumed the duration of interim analysis, including decision on early stopping, was negligibly short. We simulated 100,000 clinical trials for each design approach and for each configuration. We summarize empirical power estimates in Table 3 and results on the study duration and the numbers of randomization and marker screening in Table 4. Simulation results for other values of the marker prevalence and the early stopping boundaries within marker subgroups are given in Supplementary Material S2 and S3.

### Power

We first confirmed that all the designs controlled the study-wise alpha level 0.025 under the global null effects, HR^{(+)} = HR^{(–)} = 1.0. The sequential enrichment design provided the same power as the enrichment design, because the test in the marker-positive cohort determines the power. These designs suffered from low power when the true effect size in marker positives was smaller than that specified in sizing the marker-positive cohort (HR^{(+)} = 0.65 in our setting). In contrast, the proposed marker-stratified design with the overall or split-alpha tests for case IV generally provided high power values >80% under all the scenarios. Compared with the fixed design counterpart to the marker-stratified design, we observed almost no difference when using the fixed-sequence procedure for possible treatment-by-marker interactions under the marker hypothesis. For the split-alpha procedure, we observed a 5% decrease under quantitative interaction and a 3% increase under qualitative interaction.

### Study duration

The sequential enrichment approach, as expected, could provide substantially long study durations under non-null treatment effects in marker-positives because of the high probability of conducting two tandem trials. Meanwhile, the other designs, including the enrichment design, generally provided comparable study durations. In the marker-stratified designs, the impact of introducing the interim analysis by the proposed approach was generally moderate because of the small probability of early stopping in both subgroups, leading to early trial termination (case I).

### Number randomized

The enrichment design needed many fewer randomized patients because it evaluated marker-positive patients only. The sequential enrichment design also provided relatively smaller numbers (even compared with the traditional design), but it was partly due to the lack of power in the marker positives. The proposed marker-stratified design provided smaller numbers randomized than its fixed-design counterpart, especially under no or moderate treatment effects in marker-negative patients (i.e., HR^{(–)} = 1.0 or 0.8). We observed substantial reductions, 130 to 140 on average, of marker-negative patients with no treatment effects.

### Number screened

The sequential enrichment design could require nearly two times the numbers of patients screened for the marker than the enrichment design due to conducting two tandem trials. In the marker-stratified design, the effect of introducing an interim analysis was generally moderate, again because of the small probability of early stopping in both marker subgroups.

### Application to SWOG S0819

We also compared the design approaches in a phase III trial, SWOG S0819, to evaluate the effect of cetuximab added to carboplatin and paclitaxel, with variable inclusion of bevacizumab, in advanced non–small cell lung cancer (12). The molecular target of cetuximab, *EGFR*, as measured by FISH, is considered a potential predictive marker. The effect of cetuximab was considered larger in the *EGFR* FISH^{+} (marker-positive) subgroup, but it was not clear whether there was no effect in the negative population. Thus, the design employed was a subgroup-focused all-comers design with a coprimary analysis, with 80% allocation of type I error rate to the subgroup and the remaining 20% to the overall population (12). In the sample size calculation, the target level of improvement in the *EGFR* FISH^{+} population was set as a 33% improvement in median PFS (equivalent to HR^{(+)} = 0.75), and the target level of improvement in the overall population was a 20% improvement (equivalent to HR^{(O)} = 0.83), such that HR^{(–)} = 0.89 in the *EGFR* FISH^{−} subgroup (12). The prevalence of the *EGFR* FISH^{+} in screened patients was assumed to be 0.4. With 4-year accrual and 1-year follow-up under an accrual rate of 366 patients per year, the total number of patients screened for the marker (and randomized) was calculated as 1,462 to achieve 90% power for testing treatment efficacy in the *EGFR* FISH^{+} population, with a one-sided study-wise alpha of 2.5%. Of the 1,462 patients, 588 were expected to be *EGFR* FISH^{+} (see Table 1 in ref. 12).

In comparing the design approaches, we supposed the same trial specification with 4-year accrual and 1-year follow-up under an accrual rate of 366 patients per year for all the approaches. We specified the number of PFS events, *E*_{2}^{(+)} = 509, for the *EGFR* FISH^{+} subgroup to detect an effect of HR^{(+)} = 0.75 with a power of 90% in a log-rank test with alpha of 2.5% (supposing an enrichment trial with *EGFR* FISH^{+} patients). We supposed to conduct an interim analysis for superiority at *E*_{1}^{(+)} = 160 using a significance level *α*_{1}^{(+)} = 0.001 to detect a large effect HR^{(+)} = 0.5 with a power of 0.9, as considered in the Methods section. Meanwhile, for the *EGFR* FISH^{−} subgroup, we specified the maximum number of PFS events as *E*_{2}^{(–)} = 703 through considering a maximum number of event *E*_{2}^{(O)} = 1,211 for the overall population to detect HR^{(O)} = 0.83 with a power of 90%, that is, *E*_{2}^{(–)} = *E*_{2}^{(O)} – *E*_{2}^{(+)} = 703. Again, we supposed to have an interim analysis at *E*_{1}^{(–)} = 150 using the same futility stopping rule (Eq. A) with *c*_{1}^{(–)} = –0.113 (γ = *ψ* = 0.25), as given in the Methods section. In proposed and fixed marker-stratified designs, we applied the same hypothesis testing procedures as in the previous simulation study (see the captions in Table 3 and Supplementary Table S4-1 in Supplementary Material S4 for details). In the sequential enrichment approach, a second trial with a maximum number of events, *E*_{2}^{(–)} = 703, and an interim futility analysis at *E*_{1}^{(–)} = 150 for the *EGFR* FISH^{−} patients commenced immediately when the first enrichment trial demonstrated superiority of the treatment and the aforementioned futility stopping rule was used. We also considered a traditional all-comers design without the marker that collected the maximum number of PFS events of *E*_{2}^{(O)} = 1211 but involved an interim superiority analysis with significance level 0.001 when half (605) of the events were observed. We simulated 100,000 clinical trials for each design approach and for each configuration.

In empirical power estimates, shown in Supplementary Table S4-1 (Supplementary Material S4), we had the same tendencies regarding relative performance across the design approaches as those observed in the previous simulation study in Table 3. The proposed marker-stratified designs with the split-alpha or overall tests for case IV generally provided high power values across various profiles of subgroup treatment effects. Regarding study duration, the numbers of randomization, and marker screening, shown in Supplementary Table S4-2 (Supplementary Material S4), we also noticed a similar tendency with those observed in the previous simulation in Table 4. In SWOG S0819 with a larger sample size, we observed substantial reductions (400 or more) in the number of randomizations in the marker-negative (*EGFR* FISH^{−}) subgroup by implementing the interim futility analysis in this subgroup, in particular, when there is no treatment effect in that subgroup (i.e., HR^{(–)} = 1.0). The average number of randomization in the proposed marker-stratified design (with an interim futility analysis for marker negatives) was generally smaller than that in the traditional all-comers design with an interim futility analysis for the overall population, including marker positives with some treatment effects.

## Discussion

The numerical evaluations indicated that the marker-stratified designs (with or without interim analysis) have a nice robustness property in preserving power for detecting various profiles of treatment effects across the marker-defined subgroups. The introduction of within-subgroup interim analyses using the proposed approach can contribute to reducing the number of randomized patients in the marker-negative subgroup with limited treatment effects without a substantial loss in statistical power. Although its impact in reducing the study duration was minimal, the resultant durations were generally comparable with those of the traditional all-comers and enrichment trials when the marker prevalence in the general population is not large. The traditional all-comers and sequential enrichment designs are not generally recommended because of the lack of robustness in statistical power for possible profiles of treatment effects under the marker hypothesis. In addition, the latter can require long study durations and a large number of patients screened. The enrichment design is attractive for its smaller number of randomized patients and also for shorter study periods when the marker prevalence in the general population is large (>0.5). However, it needs compelling biological evidence that treatment efficacy is limited to the marker positives and that evaluation in the marker negatives is unethical.

Through extensive investigations with various values of the design parameters, we observed similar tendencies regarding the relative performance across the design approaches with those given in the Results section (see Supplementary Material S2 and S3). In the proposed marker-stratified design, the specification of superiority and futility stopping boundaries (*α*_{1}^{(+)} and *c*_{1}^{(–)}) affects the operating characteristics. Generally, conservative stopping rules will lead to smaller alpha spent for early detection of heterogeneous treatment effects across subgroups and larger alpha spent for the final analysis using all the patient data (case IV), yielding more differences among the test procedures used at the final analysis (see Supplementary Material S3). Determination of the design parameters, as well as the test procedure at the final analysis for case IV, should be made on a case-by-case basis, accommodating, at least, possible treatment effects across the marker subgroups.

Finally, the magnitude of the effect of introducing the within-subgroup interim monitoring on the study duration, the number of randomized patients, and the number screened will largely depend on the endpoint. Generally, we can expect greater reductions for short-term endpoints (relative to the patient accrual period), such as tumor shrinkage and PFS, rather than long-term endpoints, such as overall survival. However, even for long-term endpoints, the subgroup-specific design can explicitly protect randomized patients from unnecessary follow-up within subgroups with specified levels of error probabilities at an interim analysis.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

## Authors' Contributions

**Conception and design:** S. Matsui

**Development of methodology:** S. Matsui

**Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.):** J. Crowley

**Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis):** S. Matsui, J. Crowley

**Writing, review, and/or revision of the manuscript:** S. Matsui, J. Crowley

**Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases):** S. Matsui

**Study supervision:** S. Matsui

## Acknowledgments

This research was supported by a Grant-in-Aid for Scientific Research (16H06299) and JST-CREST (JPMJCR1412) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to S. Matsui).

## Footnotes

**Note:**Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).

- Received May 31, 2017.
- Revision received July 24, 2017.
- Accepted September 5, 2017.

- ©2017 American Association for Cancer Research.