Abstract
Phase I trials use a small number of patients to define a maximum tolerated dose (MTD) and the safety of new agents. We compared data from phase I and registration trials to determine whether early trials predicted later safety and final dose. We searched the U.S. Food and Drug Administration (FDA) website for drugs approved in nonpediatric cancers (January 1990–October 2012). The recommended phase II dose (R2PD) and toxicities from phase I were compared with doses and safety in later trials. In 62 of 85 (73%) matched trials, the dose from the later trial was within 20% of the RP2D. In a multivariable analysis, phase I trials of targeted agents were less predictive of the final approved dose (OR, 0.2 for adopting ± 20% of the RP2D for targeted vs. other classes; P = 0.025). Of the 530 clinically relevant toxicities in later trials, 70% (n = 374) were described in phase I. A significant relationship (P = 0.0032) between increasing the number of patients in phase I (up to 60) and the ability to describe future clinically relevant toxicities was observed. Among 28,505 patients in later trials, the death rate that was related to drug was 1.41%. In conclusion, dosing based on phase I trials was associated with a low toxicity-related death rate in later trials. The ability to predict relevant toxicities correlates with the number of patients on the initial phase I trial. The final dose approved was within 20% of the RP2D in 73% of assessed trials. Clin Cancer Res; 20(2); 281–8. ©2013 AACR.
Introduction
Phase I clinical trials are a critical first step in cancer drug development. A primary objective is defining a maximum tolerated dose (MTD) and recommended phase II doses (RP2D) of new drugs and determining their safety profiles (1). Later clinical trials enroll a greater number of patients with cancer and define new agent efficacy and safety for drug approval by regulatory agencies.
The traditional model of drug development in oncology depends heavily on the dose and schedule recommended in phase I trials. The small cohorts in phase I are generally underpowered to definitively detect toxicities associated with investigational agents. They also enroll a heterogeneous patient population refractory to multiple treatments, substantially different than the population in later trials.
Such trials are usually designed to detect acute toxicities. Some critics are concerned that late toxicities might not be identified in early trials, but later substantially interfere with drug tolerability (2, 3). Moreover, dose limiting toxicities (DLT) are variably defined among phase I trials (4–6). MTDs in phase I are usually determined by DLTs assessed during the first cycle (3–4 weeks) and many targeted agents are given continuously for long periods of time. Hence, future recommended doses of these agents might be different and based on long-term tolerance.
Few data in the literature evaluate how well phase I cancer-related trials prognosticate future approved doses of agents and their safety profiles Also, the clinical relevance of DLTs in early trials for future safety profiles of cancer drugs is unknown. Elucidating this correlation could optimize strategies used in early cancer drug development.
Here, we explore correlations among toxicities, especially phase I DLTs and RP2D, with toxicity profiles and dose schedules in later trials leading to regulatory approval focusing on U.S. Food and Drug Administration (FDA)–approved agents for treating oncologic and hematologic malignancies from January 1990 to October 2012.
Materials and Methods
Search strategy
Anticancer agents newly approved between January 1990 and October 2012 were identified on the FDA website (7). Agents approved for the treatment of solid and hematologic malignancies were selected for further analysis. Agents approved for pediatric cancer, supportive care, locoregional treatment, and agents whose basic compound had already been approved before 1990 were excluded. Updated package inserts for each agent were reviewed to identify indications dose schemas and clinical trials leading to approval.
An extensive search was concomitantly done through MEDLINE to identify phase I trials for each of the agents selected from the FDA database analysis. We searched for phase I trials of the single agent and for different approved combinations and schedules described in package inserts.
Matching clinical trials
Phase I trials of FDA-approved anticancer agents were matched to correlate doses and toxicities with those leading to approval of the respective agents. The criteria for matching trials were: the phase I trial enrolled patients with nonpediatric cancer and explored either monotherapy (as FDA-approved) or the same combination and schedule as described in the FDA package insert and later clinical trials of the respective agent; the phase I trial started before the later clinical trial; and the matched trials included similar patient populations of solid versus hematologic malignancies. A single phase I trial could have been matched with more than one later clinical trial as long as different tumor types and indications were involved and the studies were referenced in the FDA package insert. If more than one phase I trial met our criteria, the trial conducted in the United States was used as our analysis was restricted to the FDA database. Similarly, when more than one later clinical trial evaluated the same tumor type and dosing schedule, the trial selected had the larger number of patients and, if secondary criteria were needed, the one conducted in the United States.
FDA-approved drugs or combinations with no published trials meeting the above criteria were excluded from analysis.
Data extraction
The phase I trial data extracted included information about trial characteristics, toxicities and dosing endpoints. Information about doses of experimental agents and toxicities from matched later trials were also obtained. From these data, we enumerated the number and type of clinically relevant toxicities in later trials defined as: treatment-related toxicities leading to death, treatment delays and discontinuations, and toxicities among the three most frequent grade 3/4 laboratory and nonlaboratory toxicities with an overall incidence of at least 1%. Targeted agents were defined in our study as any agent with extra- or intracellular targets different than those of cytotoxic chemotherapy (8), excluding immunotherapy, antihormones, or conjugates. When the agent was approved in combination, the experimental agent was used for classification.
Toxicities were graded according to the criteria used in each trial. Similar toxicities were categorized under the same group as long as they were not exclusionary. All deaths reported by investigators as “possibly,” “probably,” or “definitely” related to treatment were considered toxicity-related deaths.
Statistical analysis
Toxicity correlations were summarized using descriptive statistics. We determined for each matched comparison the percentage of phase I DLTs present among the four most frequent grade 3 and 4 laboratory and nonlaboratory toxicities on matched later trials. This analysis was based on the following question: was the DLT represented in the four most frequent grade 3/4 adverse events in the matched later trial? We dichotomized the answer for this question as “yes,” when at least 50% of DLTs were represented, or “no.” We also described the absolute percentage of DLTs captured on later trials as the top four high-grade toxicities.
In addition, we determined how often clinically relevant toxicities from later trials occurred in respective phase I trials. When a clinically relevant toxicity was a composite adverse event, such as infection, at least one component should have been present in phase I for the toxicity to be considered. This analysis was also based on a question: Were clinically relevant toxicities in later trials described in the respective phase I trial? Again, the threshold for a “yes” answer was that at least of 50% of toxicities were described in the phase I trial.
Rates were calculated by dividing total number of toxicity-related deaths by total number of patients assessed for toxicity. Comparing treatment-related toxicity death rates was performed with a truncated Poisson model for the positive counts and a binomial logistic regression model for zeros versus nonzero counts. The doses of experimental agents on trials leading to drug approval were compared with the RP2Ds and percentage relation of calculated RP2D.
Proportions from independent groups were compared using the χ2 test. Multiple logistic regression was used to adjust these comparisons for other possible confounding study factors. A running lines smoother was used to graphically depict how the probability of a binary variable changed according to values of an interval-scaled covariate. Concordance between binary variables measured for the same units was assessed using the κ statistic. Analyses were performed using S+ 8.2 for Windows (TIBCO Software Inc).
Results
Search results
Between January 1990 and October 2012, we identified 91 FDA-approved drugs for nonpediatric cancer treatment. After applying previously described criteria, we included 61 approved drugs, 78 phase I trials with different approved schedules and/or combinations, and 88 later clinical trials, each referenced in respective FDA package inserts (Supplementary Fig. S1A). A total of 33,715 patients with cancer were included, 3,499 from phase I trials and 30,216 from later trials leading to regulatory drug approval. Trials included in the analysis are given in Supplementary Tables S1 and S2.
Overall, we performed 88 matched comparisons of phase I trials and their respective later clinical trial(s) to correlate drug-related toxicities and drug doses. The number of matched comparisons included in each subanalysis varied according to the availability of information required for each analysis. The characteristics of the agents and their supporting phase I trials are depicted in Supplementary Table S3.
Comparison of RP2D and doses used in later trials
For analysis, three matched trials were excluded because the optimal dose on phase I was not clearly defined. Overall, in 51 (60%) of 85 matched comparisons the dose of the agent adopted in the later trial was identical to the RP2D, but in 62 (73%) of them the dose was within 20% of the RP2D (Fig. 1A). In 73% of phase I trials, the RP2D was determined on the basis of a MTD defined by the protocol; when type of agent was examined 97% of cytotoxics used MTD for RP2D versus 58% of targeted agents (P = 0.0001). We found no difference in predicting future dose of the agent if the RP2D was based on MTD versus not (58% vs. 64%, respectively; P = 0.59).
Correlation of dosing and safety endpoints between phase I trials and later trials. A, distribution of doses of oncologic agents on later trials according to the percentage of RP2D adopted. B, number of patients included on a phase I trial, and the probability of describing of at least 50% of clinically relevant toxicities in later trials. Clinically relevant toxicities were defined as treatment-related toxicities leading to death, treatment delays and discontinuations, and toxicities among the three most frequent grade 3/4 laboratory and nonlaboratory toxicities with an overall incidence of at least 1%.
In the exploratory analysis, neither patient numbers nor number of dose levels of phase I trials or year of agent approval predicted the future dose of the agent (Supplementary Table S4). These results were reproduced when variables were analyzed as continuous covariates in logistic regression models (Supplementary Fig. S2). In the multivariate analysis, phase I trials with targeted agents performed poorly in relation to nontargeted agents in the odds of predicting a future dose within 20% of RP2D [OR, 0.2; 95% confidence interval (CI), 0.03–0.8; P = 0.025; Table 1].
Multiple logistic regression analysis correlating phase I trial characteristics and chances of establishing a later dose within 20% of RP2D (N = 82)
Relative frequency of DLTs in trials leading to agent approval
Of the 78 phase I trials involved in matched comparisons, 13 reported no DLTs (5 hormonal, 4 targeted, 3 radio/immune conjugates, and 1 cytotoxic); hence, 75 matched comparisons were included in this analysis. The median number of DLTs per phase I trial was 2 (0–11). Considering only trials presenting DLTs, the median number of DLTs per trial was 2.5 (1–8) for cytotoxic agents and 3 (1–11) for targeted agents. Overall, 53% (109 of 204) of DLTs described in all phase I trials were among the four most common grade 3/4 laboratory and nonlaboratory toxicities in later trials leading to regulatory agent approval. Twenty-one of 75 matched comparisons included (28%) did not meet the prespecified criteria of having at least 50% of DLTs in the respective phase I trial described as “top 4” grade 3 and 4 toxicities in later trials. Phase I trials with cytotoxic agents showed a trend to have DLTs represented in the four most frequent grade 3/4 adverse events of later trials than phase I trials with targeted agents (P = 0.012). Similar trends were observed for nonsolid versus solid tumors and single center versus multicenter phase I trials (P = 0.067 and 0.16, respectively; Table 2).
Correlation of relevant safety information from phase I trials and later trials
Ability of phase I trials to describe later clinically relevant toxicities
A total of 4 of 88 matched comparisons were excluded from analysis because the registration trial did not describe toxicities that met our criteria for clinical relevance. We identified 530 clinically relevant toxicities in the later trials, of which 374 (70.6%) were at least cited in respective phase I trials and 126 (24%) described as DLTs. In 16 of 84 matched comparisons the phase I trial described less than 50% of clinically relevant toxicities. The factors that were significantly related to a better performance in describing these toxicities were the higher number of patients included in a phase I trial (P = 0.026) and the inclusion of diverse tumor types (P = 0.024; Table 2).
In addition, the relationship between number of patients included in a phase I trial and probability of describing at least 50% of clinically relevant toxicities in later trials was significant (P = 0.0032, logistic regression with number of patients as a continuous variable; Fig. 1B). Beyond 60 patients on the phase I trial, there was no significant improvement in the ability to describe clinically relevant toxicities in later trials.
As mentioned above, of the 78 phase I trials that were involved in matched comparisons, 13 described no DLTs, and in only 3 of the later trials of these agents were clinically relevant toxicities not described.
Concordance between high-grade toxicities and mortality in early and later trials
We analyzed the concordance of reporting grade 3 and 4 toxicities as one of the top 3 toxicities in phase I and later trials matched for drugs. Of 28 toxicities occurring frequently enough for analysis, 20 had poor agreement (κ < 0.4), 7 had fair to good agreement (κ 0.4–0.75), and only the occurrence of “fever/chills” had excellent agreement (κ > 0.75) in the context of being reported as the top grade 3 and 4 toxicities among both early and later trials (Table 3).
Concordance in top 3 grade 3/4 toxicities between earlier and later trials
Among 3,499 participants in all phase I studies, 21 deaths (0.6%) were considered at least possibly related to treatment. In later trials with information about treatment-related deaths, 402 toxicity-related deaths were registered for 28,505 participants (1.41%). The treatment-related death rates were 1.01%, 1.44%, and 1.74% depending whether the trial adopted a dose less than 80%, 80% to 120%, or more than 120% of RP2D, respectively, (P = 0.0015).
Discussion
Dosing based on phase I trials was associated with a safe toxicity–mortality rate in later trials that led to FDA agent approval. Most-later trials (73%) adopted a dose within 20% of the RP2D found in the phase I trial. In addition, a substantial proportion (70%) of clinically relevant toxicities found in the registration trials were described in earlier trials.
Previous studies showed that identifying the RP2D is a heterogeneous process (9). For traditional cytotoxic agents higher doses and increased exposure to treatment correlate with better results (10) with dose determination driven by toxicities. In contrast, targeted agents produce broader activity at different dose levels (11) and parameters used to delineate the RP2D can include pharmacodynamic and pharmacokinetic information (12), although most targeted drugs in early trials are subject to traditional endpoints to determine a MTD (13). Importantly, as MTD is determined within a short window of time (3–4 weeks) in a small number of patients, chronically used targeted agents will likely result in toxicities that are not detected during the MTD window (2). Indeed, after adjusting for confounding effects in the multivariate analysis, our data suggested that phase I trials with targeted agents are less predictive of the final dose used in trials leading to FDA approval of cancer drugs compared with other classes of agents.
Importantly, though including a limited number of patients and multiple dose levels of drugs, early trials seem to achieve their key objective assuring safety during later steps of drug development. Indeed, among approximately 29,000 patients enrolled in later trials, only 1.41% succumbed to toxicities at least possibly related to treatment. The treatment-related mortality presented a significant relationship with the percentage of RP2D adopted in later trials, with higher doses associated with higher mortality. These data should be interpreted cautiously as safety and survival endpoints might not be correlated (14). Previous reviews reported a global toxicity mortality rate range between 0.4% and 0.59% for oncologic agents in early trials (15–17). In agreement, we report a rate of 0.6% for deaths possibly due to toxicity of agents in phase I trials of drugs subsequently approved by the FDA.
There are also selection differences in phase I versus later trials; the former usually recruit patients having more advanced cancers and varied tumor types. Drug toxicities might differ amongst these populations. Nevertheless, there was high predictive ability for phase I trials to identify clinically relevant toxicities. Attribution of toxicities to experimental agents is also not straightforward in phase I trials and errors are frequent in early trials (18). Interestingly, toxicities easily attributable to experimental drugs, such as hand–foot syndrome, neurotoxicity, peripheral neuropathy, neutropenia, and mucositis, were most concordant when comparing top 3 grade 3 and 4 toxicities between early and later trials.
Our DLT analysis showed approximately 50% of DLTs not among the most frequent high-grade events in later clinical trials. Previous studies comparing DLTs in phase I trials of cytotoxic versus targeted agents showed that DLTs are more frequently identified and are more prone to be hematologic toxicities in phase I trials of cytotoxic agents (4, 19). Our findings also suggest that DLTs in phase I trials of cytotoxic agents are more likely to be among frequently occurring high-grade toxicities during later stages of drug development compared with targeted agents.
In addition, we described a trend for single institutional phase I trials to perform better in describing DLTs that are relevant to the future profile of drugs. Previous authors identified difficulties in recognizing relevant drug-induced side effects when multiple investigators are involved, due to unfamiliarity with drugs and number of patients diluted among multiple institutions (20). These data suggest that multiinstitutional phase I trials do not perform better in describing the safety of phase I agents.
Not surprisingly, the ability to predict clinically relevant toxicities in later trials correlated strongly with the number of patients in the phase I trial (up to ∼60 patients). Therefore, limited expansion of the number of patients on phase I trials is likely warranted if the objective of predicting toxicities is deemed important. Our data suggest, however, that no gain is acquired by including more than 60 patients when the objective is to delineate important toxicities that will occur in future clinical trials. There may, however, be other reasons for expanding these trials, such as better assessing responses in subsets of disease before proceeding with drug development (21).
There are several limitations to our study. For instance, our findings were generated using cancer drugs subsequently approved by the FDA. Obtaining similar correlations for drugs that were submitted to but not approved by the FDA might delineate how the lack of correlation between early and later trials could impact the ultimate success of drug approval. However, many of these failed studies are never published, so analysis and comparisons are problematic. Second, our data are based on reported toxicities and so it is not possible to know to what extent our findings are influenced by deficiencies in data reporting. Third, correlating phase I trials with later trials was only possible after applying specific criteria, as outlined in the Materials and Methods. Given these restrictive criteria, a few important drugs had to be excluded and the impact of their absence from our analysis is not known. Finally, as most phase I trials used the traditional 3 + 3 design, we cannot extend our findings to model-based designs.
In conclusion, early trials performed well in terms of dose prediction and description of safety profile of cancer drugs. Indeed, dosing based on phase I trials was associated with a low toxicity–mortality death rate on later trials. The ability to predict clinically relevant toxicities correlated with the number of patients on the initial phase I study.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: D.L. Jardim, D.S. Hong
Development of methodology: D.L. Jardim, D.S. Hong
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): D.L. Jardim, P. LoRusso
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D.L. Jardim, K.R. Hess, P. LoRusso, R. Kurzrock, D.S. Hong
Writing, review, and/or revision of the manuscript: D.L. Jardim, K.R. Hess, P. LoRusso, R. Kurzrock, D.S. Hong
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): D.L. Jardim
Study supervision: D.L. Jardim, P. LoRusso, D.S. Hong
Acknowledgments
The authors thank Joann Aaron, Department of Investigational Cancer Therapeutics, for editorial support.
Footnotes
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
- Received August 10, 2013.
- Revision received October 18, 2013.
- Accepted October 28, 2013.
- ©2013 American Association for Cancer Research.