## Abstract

In recent years, investigators have recognized the rigidity of single-agent, safety-only, traditional designs, rendering them ineffective for conducting contemporary early-phase clinical trials, such as those involving combinations and/or biological agents. Novel approaches are required to address these research questions, such as those posed in trials involving targeted therapies. We describe the implementation of a model-based design for identifying an optimal treatment combination, defined by low toxicity and high efficacy, in an early-phase trial evaluating a combination of two oral targeted inhibitors in relapsed/refractory mantle cell lymphoma. Operating characteristics demonstrate the ability of the method to effectively recommend optimal combinations in a high percentage of trials with reasonable sample sizes. The proposed design is a practical, early-phase, adaptive method for use with combined targeted therapies. This design can be applied more broadly to early-phase combination studies, as it was used in an ongoing study of a melanoma helper peptide vaccine plus novel adjuvant combinations. *Clin Cancer Res; 23(23); 7158–64. ©2017 AACR*.

## Introduction

Historically, in single-agent dose-finding oncology studies, the main objective has been to identify the MTD among a range of predefined dose levels. Each increasing dose level is associated with an assumed increasing probability of dose-limiting toxicity (DLT). The underlying assumption is that the MTD is the highest dose that satisfies some safety requirement and, therefore, provides the most promising outlook for efficacy; thus, decisions about which dose to recommend for further research were based upon safety outcomes. In this framework, dose-finding trials do not incorporate an efficacy endpoint in the decision process. Two recent articles in *Clinical Cancer Research* (1, 2), among others, have highlighted the fact that the current landscape of oncology drug development is challenging the traditional “MTD-based” approach to early-phase clinical trials. Sachs and colleagues (1) provide evidence that determination of phase II recommended doses using MTD-based approaches has resulted in inappropriate dosing for some therapies, such as targeted agents. The authors contend that, in addition to a DLT endpoint, dose-finding strategies should incorporate an “effect marker,” with the goal of locating an effective dose. Examples of effect markers include an early measure of efficacy (e.g., clinical response), pharmacokinetics/pharmacodynamics, biological targets (e.g., immune response or binding/inhibition of therapeutic targets), and more. Figure 1 in the work by Sachs and colleagues (1) illustrates that doses below the MTD are being approved, indicating that targeting the MTD may not be the appropriate primary trial objective. Nie and colleagues (2) argue that the traditional 3 + 3 design is inadequate for meeting the objectives of studies involving targeted therapies and called for wider use of more innovative methods with the goal of answering more complex research questions. The statistical and medical literature is full of reviews, justification, and recommendations on the use of novel designs (2–4). Contemporary dose-finding problems have created the need to adapt early-phase trial design to include additional endpoints in the decision process, thereby conducting phase I/II studies. For a more comprehensive study of the phase I/II paradigm, we refer the reader to a recent textbook by Yuan and colleagues (5).

Another challenge presented in contemporary early-phase trials is in the design and conduct of combination studies. It is reasonable to assume that toxicity increases with dose in a single-agent setting, but it may be difficult to characterize the toxicity relationship between some of the combinations being tested. One approach to this problem is to reduce the two dimensions to one dimension by prespecifying an escalation path in which the toxicity ordering is completely known between combinations and applying the single-agent traditional 3 + 3 method along this path. However, such an approach can potentially miss more promising dose combinations located outside the path. Increasing the complexity, several combinations may have an acceptable toxicity profile that meet the definition of the MTD combination, so an efficacy or activity endpoint would distinguish one combination as the optimal dose for that combination. Several recent phase I/II methods allow for the assessment of both toxicity and efficacy in drug combination studies (6–10).

In this article, we present a model-based, early-phase design for combining two targeted agents that accounts for both safety and efficacy to identify an optimal dose for the combination. The statistical modeling structure is outlined by Wages and Conaway (11). We describe the implementation of this method in an ongoing, multi-institution trial (NCT02419560) designed at the University of Virginia (UVA; Charlottesville, VA) Cancer Center, studying venetoclax (ABT-199) in combination with ibrutinib for the treatment of relapsed or refractory mantle cell lymphoma. Recently published recommendations (12, 13) for conducting novel early-phase methods were adhered to in implementing the described design.

## Materials and Methods

The study described is a multi-institution, phase Ib evaluation of the safety and efficacy of two dose levels of venetoclax in combination with three dose levels of ibrutinib, as shown in Table 1. As venetoclax has been associated with significant tumor lysis when initiating therapy at maximum dose, careful dose escalation of venetoclax is required prior to reaching the assigned dose level (14). Thus, all participants start treatment with 1 week of venetoclax monotherapy before starting the allocated ibrutinib dose in combination. Venetoclax is then dose escalated to the assigned dose level. Treatment combinations are grouped into toxicity “zones” (1, 2, 3, or 4), based on the dose level of each agent in the treatment combination. The trial was designed to find the optimal dose combination, defined as a combination estimated to have acceptable toxicity and a good response profile. An adaptive design is being used to guide accrual decisions with toxicity and efficacy assessments characterizing the main decision measures. The decision endpoints are DLTs and efficacy, as measured by the attainment of a partial or complete response 2 months after the start of treatment.

In monitoring safety, adverse events (AE) are being assessed and acute toxicity graded using the NCI Common Terminology Criteria for Adverse Events (CTCAE) Version 4.03. A participant is classified as experiencing a DLT (yes/no) based on protocol-specified criteria. In this study, a DLT is defined as any unexpected AE that is possibly, probably, or definitely related to treatment and meets the following criteria: (i) any nonhematologic toxicity grade ≥3 except for alopecia, fatigue, and nausea uncontrolled by medical management; (ii) grade 4 neutropenia lasting more than 5 days; (iii) febrile neutropenia of any duration; (iv) grade 4 thrombocytopenia, grade 3 thrombocytopenia with bleeding, or any requirement for platelet transfusion; or (v) grade 4 anemia unexplained by underlying disease. Efficacy is defined by response to treatment, which is determined using criteria modified from Cheson and colleagues (15). An important consideration is that the result of the efficacy response measure must be available in a reasonable time frame if it is to be useful in guiding trial enrollment. Thus, it is important to select an efficacy endpoint that occurs early enough to be meaningful and to design processes for collecting data rapidly, so that it may guide participant enrollment in accord with the study design. If the efficacy endpoint occurs much later than the toxicity endpoint and/or accrual is rapid, the trial could be exposed to higher amounts of missing data. In this case, methods that account for delayed outcomes in the modeling procedure could be explored (16, 17). As data for each endpoint accumulate, each participant is classified as experiencing a DLT (yes/no) and experiencing a response (yes/no). On the basis of the expectedness of AEs, the DLT tolerance level was chosen to be 25% (i.e., any optimal combination that we are satisfied has an estimated DLT probability ≤25% to be considered “acceptable” in terms of safety).

### Model-based estimation

Model-based allocation is being based upon a continual reassessment method (CRM; ref. 18) that accounts for two binary endpoints (DLT and efficacy) in combinations of agents (11). Safety assessments are based on the assumption that, as the dose of one agent remains fixed and the dose of the other agent increases, the probability of a DLT is increasing. In other words, toxicity increases across rows and up columns of Table 1. It is reasonable to assume that combinations in higher zones have higher probabilities of DLTs than combinations in lower zones. It is unknown whether combinations have higher or lower DLT probabilities than other combinations within the same zone. It could be that B < C or C < B in terms of their respective DLT probabilities. This uncertainty is expressed through specification of the multiple one-parameter models in Table 2 that reflect different orderings of the DLT probabilities. Model selection techniques are used to choose the model most consistent with the observed data. A common model choice in the CRM is to raise a set of initial DLT probability estimates, also referred to as the “skeleton” of the model, to a power that is a parameter to be estimated by the data, where indexes the skeleton. For each possible DLT ordering, in Table 2, the DLT probabilities are modeled via a one-parameter power model , where the are the skeleton values for ordering also given in Table 2. The skeleton values for each model were generated using the algorithm of Lee and Cheung (19). Using the available toxicity data, the CRM is fit for each DLT probability working model, and the parameter is estimated for each ordering by maximum likelihood estimation, where the likelihood is given by where = the number of DLTs and = the number of participants evaluated for DLT on combination . The working model with the largest likelihood is chosen and, using the selected model, DLT probability estimates are updated for each combination. If there is a tie between the highest likelihood values of two or more models, then the selected model is randomly chosen from among those with tied likelihood values.

The working models for efficacy probabilities are formulated under three different assumptions: (i) The probabilities are increasing with increasing zone; (ii) the probabilities increase initially and then plateau after a certain dose of ibrutinib; or (iii) the probabilities increase initially and then plateau after a certain zone, as displayed in Tables 3 and 4. Like toxicity, these possible shapes for the combination–efficacy curve are expressed through multiple skeletons of CRM models. We again rely on a class of one-parameter power models, indexed by , to formulate working models for the efficacy probabilities. For each efficacy working model, in Tables 3 and 4, the efficacy probabilities are modeled via a one-parameter power model , where the are the skeleton values for order . Using the accumulated efficacy data, the CRM is fit for each efficacy probability working model, and the parameter is estimated for each model by maximum likelihood estimation, where the likelihood is given by where = the number of responses and = the number of participants evaluated for efficacy on combination . Again, the working model with the largest likelihood is chosen and, using the selected model, efficacy probability estimates are updated for each combination. We make allocation decisions based on the probability estimates for both DLT and efficacy.

For combination B–F, a two-sided 80% confidence interval (CI) is calculated using the estimated DLT probability for that combination, based on confidence interval estimation for CRM models (20). If the lower bound of this CI exceeds the maximum toxicity tolerance of 25%, then this combination is deemed too toxic and excluded from the acceptable set of combinations. If combination A is excluded from the acceptable set, then no combination is considered acceptable, and the trial is stopped for safety. Therefore, for combination A, the level of confidence is set at 90% instead of 80%.

In sequentially obtaining model-based dose recommendations, an estimated probability of efficacy will be calculated for each combination in the acceptable set. The recommended combination will be based upon how many participants have been entered into the study to that point. For the first third of the trial (i.e., 1/3 the maximum sample size), the combination recommendation is based on randomization using a weighted allocation scheme. Randomization prevents the design from getting “stuck” at a suboptimal combination based on limited data (21). The recommended combination for the next entered participant is chosen at random from the “acceptable” combinations, with each acceptable combination weighted by its estimated efficacy probability, that is, acceptable combinations with higher estimated efficacy probabilities have a higher chance of being randomly chosen as the next recommended combination. For the latter two thirds of the trial (i.e., final 2/3 of maximum sample size), the recommended combination for the next entered participant is defined as the “acceptable” combination with the highest estimated efficacy probability.

### Stopping the trial

Accrual to the study will be halted and trigger a safety review by the study investigators and the data and safety monitoring committee to determine whether the study should be modified or permanently closed to further accrual according to the following: (i) Accrual will be halted for safety if the first three entered participants in zone 1 experience a DLT; (ii) if at any point in stage II, the set of acceptable combinations is empty and no combinations are considered safe, the trial will stop for safety; and (iii) otherwise, accrual to the study will end if the recommendation is to assign the next participant to a combination that already has 10 participants treated at that combination or the prespecified maximum sample size of 28 eligible patients has been reached.

### Sample size and accrual

Maximum target sample size for the optimal combination is based upon acquiring sufficient information to assess the objective of estimating efficacy rates while satisfying safety conditions, assuming at least one optimal combination has been found. Based upon simulation results, 10 eligible participants treated at the optimal combination will provide adequate data to assess efficacy. The target of 10 participants at the optimal combination was chosen based on having sufficient information to determine whether the optimal combination shows an improved 2-month partial response (PR) rate compared with the single-agent ibrutinib rate, which was estimated at 34% (95% CI, 25–44; ref. 22). For this study, the optimal combination would be considered promising with an observed 2-month PR of at least 70% (7/10), a doubling, which results in the lower limit of a one-sided 90% CI exceeding the 44%. Total study sample size is estimated from the simulations and is determined by the stopping rules in the “Stopping the trial” section. The maximum total sample size was set to 28 eligible participants; however, as indicated in the simulation results, the maximum average trial size over all scenarios is closer to 20 participants.

## Results

Accrual to combinations occurs in two stages. Within the modeling framework described in this article, an initial stage of escalation is needed to get the trial underway. Model-based estimates for outcome probabilities do not exist until some heterogeneity in the data for each endpoint has been observed (23), that is, we need at least one DLT and one non-DLT to estimate the DLT probabilities, and we need at least one response and one nonresponse to estimate efficacy probabilities at each combination. The initial stage accrued eligible participants in cohorts of one on each combination until a participant experienced a DLT. The role of stage I is to have a “dose-escalation” beginning to the study to test the safety of “lower” combinations that are assumed to be less toxic. Also, the safety data are available slightly faster than are the efficacy data, so stage I allows allocation to combinations based only on a toxicity endpoint while waiting for the first few efficacy response observations to be observed. The second stage is allocating eligible participants in cohorts of one according to the procedure described in the “Model-based estimation” section.

### Allocation in completed stage I

The escalation plan for the first stage was based on the zones. With this design, participants could be accrued and assigned to other open combinations within a zone, but escalation would not occur outside the zone until the minimum required timeframe has elapsed for the first participant accrued to combination A. The minimum follow-up period for determination of escalation between zones was 6 weeks from the start of cycle 1. If the minimum follow-up period is not satisfied at that time, a new participant is ready to be put on study; then, the participant may be accrued to any arm within the highest zone being assessed by random allocation, with the intent of minimizing halts to accrual and trial duration. Initial allocation within a zone was based upon random allocation (1:1) between the possible combinations. Escalation to a higher zone occurred only when all combinations in the lower zone had been tried and no DLT had been observed. Participant allocation to subsequent combinations within the new zone followed the same accrual strategy. This allocation strategy was followed for accrual to increasing zones until a participant experienced a DLT or a stopping rule was triggered. Accrual began at UVA in September 2015 and was slow to initiate. The first participant was given combination A, and he/she did not experience a DLT. The second participant was randomized in January 2016 to receive a combination in zone 2 (B or C). The randomization resulted in this participant receiving combination B, and this patient did not have a DLT. The third participant filled the remaining combination in zone 2 beginning in February 2016 and did not have a DLT on combination C. The study was then opened at other sites, and the next participant was accrued to the study in May 2016, which allowed for the 6-week DLT and 2-month efficacy window to be assessed for the first 3 participants. At this time, the randomization strategy continued into zone 3, with the fourth participant receiving combination D, and no DLT being observed. The fifth participant was accrued to the study in early June 2016, and he/she experienced the first DLT on combination E, at which point, the second-stage allocation strategy using multidimensional CRM modeling began (24). While awaiting outcomes for the minimum required follow-up period for zone 3, 2 additional participants (sixth and seventh) were enrolled and randomly assigned to combinations D and E, respectively, in July 2016.

### Allocation in ongoing stage II

Stage II is allocating eligible participants based upon the multidimensional CRM modeling approach described in the “Model-based estimation” section. Model-based estimation of DLT probabilities began for the accrual of the eighth participant to the study. After each new accrual in stage II, the estimated DLT probabilities are being updated and used to define a set of “acceptable” combinations in terms of safety. If the minimum follow-up period for participants already on study is not satisfied at the time a new participant is ready to be put on study, then the participant may be accrued to any combination by random allocation, which has accrued at least one participant and is in the acceptable set.

Model-based estimation of efficacy probabilities began at the beginning of stage II, as at least one efficacy response and one nonefficacy response were observed in stage I. All participants on arms A, B, and C achieved at least a PR, and one nonresponse occurred on arm E (24). If no responses had been observed in stage I, patients would have been randomized to acceptable arms until a response was observed. At the time of combination allocation for the next participant, model-based estimates are calculated for both DLT and efficacy probabilities using the available observed data from all participants accrued to the study at that time. For instance, for the first participant accrued in stage II (eighth participant), estimates for the DLT and efficacy probabilities were based on available data from the first 5 patients accrued to the study, as participants 6 and 7 were allocated to arms E and D, respectively, while waiting for the minimum follow-up period in zone 3. Accrual of the eighth participant occurred at the end of July 2016, at which time, complete DLT and response data were available for each of the first 5 participants. For arms {A, B, C, D, E, F}, the DLT data at this point were {0/1, 0/1, 0/1, 0/1, 1/1, 0/0}, and the efficacy data were {1/1, 1/1, 1/1, 1/1, 0/1, 0/0}. Using the procedure described in the “Model-based estimation" section, the estimated DLT probabilities for arm {A, B, C, D, E, F} were {0.05, 0.15, 0.09, 0.22, 0.31, 0.39}, from which arms A–E were deemed to be acceptable based on CI estimation. The estimated efficacy probabilities were {0.8, 0.8, 0.8, 0.8, 0.8, 0.8}, and, based on these estimates, the eighth participant was randomly assigned arm C. The recommendation of the ninth participant used updated DLT and response data from the sixth and seventh participants to calculate estimates and make a decision. This adaptive decision process will continue until sufficient information about the optimal dose combination has been obtained, according to the stopping rules described in the “Stopping the trial” section. It is important to note that in this design approach, some model-based decisions may be made using slightly less efficacy data than DLT data due to the longer minimum observation window for efficacy. For example, the recommendation for the 11th participant was based on DLT data from the first 9 participants and efficacy data from the first 8 participants, as the 2-month response data for participant 9 had yet to be fully observed when participant 11 was accrued to the study. Currently, the trial has accrued 15 participants from September 2015 through February 2017, with model-based allocation (stage II) having been utilized after the first 7 participants. Allocation based on weighted randomization occurred for the first 2 participants in stage II, which is approximately one third the maximum sample size when combined with the data from the stage I participants. To date, in stage II, the number of participants accrued to arms {A, B, C, D, E, F} is {1, 2, 4, 1, 1, 0}. The only DLT has occurred on arm E, and the only nonresponses have occurred on arms E and D.

### Statistical properties

Simulation results were run to display the performance of the study design (see Table 5). To evaluate operating characteristics, six scenarios of assumed DLT and efficacy probabilities were chosen to reflect the following four “themes”: (i) All assumed DLT probabilities are acceptable in terms of safety (i.e., ≤25%), and the highest zone has the combination with the highest assumed efficacy rate (scenarios 1 and 3); (ii) three combinations have assumed DLT probabilities more toxic than 25%, and the assumed efficacy probabilities begin to plateau at dose level two of ibrutinib (420 mg/day; scenarios 2 and 4); (iii) five combinations (B–F) have assumed DLT probabilities more toxic than 25%, making combination A the only acceptable combination in terms of safety (scenario 5); and (iv) when all combinations are too toxic (i.e., much more toxic than 25%; scenario 6). For each scenario, 1,000 simulated trials were run, with the optimal combination(s) indicated in bold type in Table 5. Displayed in the table within each scenario for each arm is the assumed true DLT probability, the assumed true efficacy response rate, the percentage of trials in which the combination was recommended as the optimal dose combination, and the number of participants treated on each combination. Displayed in the last four columns is the average and selected percentiles for the trial size at study closure, the percentage of times in the simulations that the trial closed due to safety concerns, the percentage of simulated participants that had a DLT, and the percentage of simulated participants that had an efficacy response. The results displayed in Table 5 were based upon a maximum target accrual of 28 participants where accrual stopped when 10 participants had been treated on the recommended “optimal’” dose combination or the maximum accrual had been reached. With this type of design and stopping rules, the results indicated that on average, the trial would achieve this goal with accrual of approximately 20 participants.

It is clear from examining the results in Table 5 that the proposed design is performing well in terms of recommending optimal dose combinations, as well as allocating participants to these combinations. In scenario 1, the design selects as the optimal dose combination the target combination in 67% of simulated trials while assigning 7.55 of 19.1 participants (40%) on average to this combination. Similar findings are obtained from scenario 3. In scenario 2, recommendation of target combinations as the optimal treatment combination occurs in approximately 38% of simulated trials based on an average trial size of 18.1 participants while allocating 5.47 participants on average to the optimal dose combination. It is important to note that when the target combination is not selected as the optimal dose combination, treatments with assumed DLT probabilities marginally outside the range of acceptable safety are selected in another 23% of simulated trials. Similar findings are obtained from scenario 4. In scenario 5, the design identifies the target combination as the optimal dose combination in approximately 26% of simulated trials while allocating 4.41 of 18.4 participants on average to this combination. When combination A is not selected, the method tends to either choose combination B with an assumed DLT rate just outside the window of acceptable safety (25% of the time) or stop the trial for safety (11.8% of the time). Finally, in scenario 6, where all combinations are overly toxic, the method correctly terminates the study in 100% of simulated trials based on an average trial size of 4.3 participants and treats 1.98 accrued participants on average to zone 1. Overall, the simulation results indicate that the design outlined in this article is a practical early-phase adaptive method for use with combined targeted therapies.

## Conclusions

The development of new methods in early-phase dose finding has been rapid in the past decade, yet the use of innovative designs remains infrequent. In this article, we have outlined a novel early-phase adaptive design, implemented in an ongoing trial of six treatment dose combinations of two targeted agents for participants with relapsed or refractory mantle cell lymphoma. The method presented in this article describes an innovative and appropriate approach for investigating combinations of targeted therapies, which are being called for by the FDA and by others (2–4). Simulation studies were performed to evaluate the performance of the design characteristics and are reported in Table 5. The results demonstrate the method's ability to effectively recommend the optimal dose combination, defined by acceptable toxicity and high efficacy rates, in a high percentage of trials with manageable sample sizes. Software in the form of R (25) code for both simulation and implementation of the method is available upon request of the first author. The method we outline in this work can be viewed as an extension of the CRM, utilizing multiple skeletons for DLT and efficacy probabilities and increasing the ability of CRM designs to handle more complex dose-finding problems. The numerical results presented include the types of simulation information that aid review entities in understanding design performance, such as average sample size, frequency of early trial termination, and so on, which we hope will augment early-phase trial design for targeted therapy combinations in cancer. This design can be applied more broadly in early-phase combination studies that need to consider an “effect marker” in addition to toxicity (26), as it was used in a recently completed study of a melanoma helper peptide plus novel adjuvant combinations (NCT02425306; ref. 27). The design would work well with any well-defined, binary “activity” endpoint.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

## Authors' Contributions

**Conception and design:** N.A. Wages, C.A. Portell, M.R. Conaway, G.R. Petroni

**Development of methodology:** N.A. Wages, C.A. Portell, M.R. Conaway, G.R. Petroni

**Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.):** C.A. Portell, M.E. Williams

**Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis):** N.A. Wages, C.A. Portell, M.E. Williams, G.R. Petroni

**Writing, review, and/or revision of the manuscript:** N.A. Wages, C.A. Portell, M.E. Williams, G.R. Petroni

**Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases):** M.E. Williams

**Study supervision:** N.A. Wages, M.E. Williams

## Grant Support

This work is supported by the NCI (K25CA181638 to N.A. Wages and R01CA142859 to G.R. Petroni and M.R. Conaway), the Biostatistics Shared Resource, University of Virginia Cancer Center, University of Virginia (P30 CA044579), University of Virginia Lymphoma Research Fund (to M.E. Williams), and Lymphoma Research Foundation Lymphoma Clinical Research Mentoring Program Scholar (to C.A. Portell). The clinical study was funded by a grant from Abbvie Inc. to the University of Virginia.

- Received April 12, 2017.
- Revision received May 15, 2017.
- Accepted July 13, 2017.

- ©2017 American Association for Cancer Research.