Purpose: We looked at the value of three preclinical cancer models, the in vitro human cell line, the human xenograft, and the murine allograft, to examine whether they are reliable in predicting clinical utility.
Experimental Design: Thirty-one cytotoxic cancer drugs were selected. Literature was searched for drug activity in Phase II trials, human xenograft, and mouse allografts in breast, non-small cell lung, ovary, and colon cancers. Data from the National Cancer Institute Human Tumor Cell Line Screen were used to calculate drug in vitro preclinical activity for each cancer type. Phase II activity versus preclinical activity scatter plot and correlation analysis was conducted for each model, by tumor type (disease-oriented approach), using one tumor type as a predictor of overall activity in the other three tumor types combined (compound-oriented approach) and for all four tumor types together.
Results: The in vitro cell line model was predictive for non-small cell lung cancer under the disease-oriented approach, for breast and ovarian cancers under the compound-oriented approach, and for all four tumor types together. The mouse allograft model was not predictive. The human xenograft model was not predictive for breast or colon cancers, but was predictive for non-small cell lung and ovarian cancers when panels of xenografts were used.
Conclusions: These results suggest that under the right framework and when panels are used, the in vitro cell line and human xenograft models may be useful in predicting the Phase II clinical trial performance of cancer drugs. Murine allograft models, as used in this analysis, appear of limited utility.
Both basic science studies and clinical trials are essential components of the cancer drug discovery process. Potential therapeutics found to be significantly better than no treatment or standard therapies (i.e., active) in preclinical laboratory cancer models or compounds with novel chemotypes and equivalent effectiveness to standard treatments are advanced to confirmatory testing in early (Phase I and II) clinical trials. Considering that RR3 is a reasonable surrogate end point for survival (required but not sufficient), a favorable RR in Phase II trials advances a drug into additional clinical testing and is considered a prerequisite of drug success in the clinic.
Advancing of a candidate drug from preclinical testing in the laboratory to testing in Phase II clinical trials is based on the assumption that drug activity in cancer models translates into at least some efficacy in human patients, i.e., that cancer laboratory models are clinically predictive. In addition, the relevance of tumor type-specific preclinical results for the corresponding human cancers in the clinic can be viewed through two different approaches: compound-oriented, where a drug is assumed to have potential activity against all human tumor types if it is effective against a single test tumor type, and disease-oriented, where a drug with preclinical activity in a single tumor type would only be expected to be effective in the same tumor type in patients.
Although widely adopted, the above-mentioned assumption and approaches have not been confirmed by studies to date. In addition, all studies aimed to examine the clinical predictive value of laboratory cancer models inevitably suffer from inherent bias because compounds with no activity in preclinical models are generally not advanced to clinical trials.
This work was undertaken to examine the clinical predictive value of three preclinical cancer models that have found wide use: the human in vitro cell line; the mouse allograft; and the human xenograft. In these models, tumor volume or life span (in vivo mouse models) or cell growth (in vitro cell lines) is compared between the treatment group receiving the new drug and a control group (active or inactive control).
The use of preclinical cancer models for selection of potential cancer therapeutics was pioneered by the NCI in the United States in the mid-1950s. The screening strategies used until 1990 were essentially compound oriented and involved a small number of predominantly murine allograft tumors, with emphasis on leukemia (1, 2, 3, 4, 5, 6, 7) . Several studies from the NCI and others demonstrated that this approach had low clinical predictive value for activity in Phase II trials (5, 6, 7, 8, 9) and yielded compounds with selective activity toward human leukemias and lymphomas (10, 11, 12) . Thus, in 1990, the NCI introduced a disease-oriented in vitro Human Tumor Cell Line Screen comprised of 60 cell lines from the most common adult tumors (13, 14, 15, 16, 17) . The screen was designed so that each tumor type was represented by a panel of cell lines, selected on the basis of different subhistological features, and common drug resistance profiles. It was hoped that this screen would help identify drug leads with high potency and/or selective activity against particular tumor types.
Recently, the NCI examined the correlation between drug activity in Phase II clinical trials and preclinical activity in cancer models (18) . Important findings were: (a) with the exception of NSCLC, preclinical activity in human xenografts of a particular tumor type did not correlate significantly with Phase II activity in the same type of tumor, (b) with the exception of breast and colon histologies, human xenografts did not significantly predict Phase II clinical activity in other cancers types; and (c) compounds that were active in at least one-third of all tested human xenografts were likely to have at least some activity in Phase II clinical trials.
Studies examining the clinical predictive value of preclinical cancer models outside the scope of the NCI screening programs have focused on the human xenograft model and have looked predominately into same-tumor correlations (disease-oriented approach). These studies have produced both positive (the model was found clinically predictive) and negative (the model was found to have no clinical predictive value) results in various tumor types (19, 20, 21, 22, 23, 24, 25, 26, 27) .
Two major criticisms can be made on the overall body of literature concerning the clinical predictive value of preclinical cancer models. First, the vast majority of studies to date, both within and outside the NCI, have based their conclusions on the observation of trends rather than the use of statistical methods. Second, all studies conducted previously have used dichotomous definitions of preclinical and/or clinical activity based on largely invalidated cutoff values of measures of activity: a 20% RR in Phase II clinical trials and (most commonly) a 42% T/C% in human xenografts and mouse allografts.
In addition, two important questions have not been addressed at all by previous studies: the clinical predictive value of the in vitro cell line model and the relative clinical usefulness of the different preclinical cancer models in use today (i.e., how different models compare with each other in terms of their ability to identify clinically effective drugs).
Thus, we conducted a study comparing the clinical (Phase II) predictive value of three widely used preclinical laboratory cancer models, the in vitro human cell line, the mouse allograft, and the human xenograft. We used quantitative measures of both clinical and preclinical activity and statistical methods. We considered three relevant questions: (a) the clinical predictive value of the three models within the same tumor type (disease-oriented approach); (b) the clinical predictive value of the three models when one preclinical tumor type is used as a predictor of overall clinical activity in all other tumor types (compound-oriented approach); and (c) the clinical predictive value of the three models when overall preclinical and clinical activity in all tumor types combined is considered.
MATERIALS AND METHODS
A retrospective, literature-based study was conducted. Data were retrieved from studies published between 1985 and 2000. This period was chosen as one when all three preclinical cancer models of interest to this study were in use and because it was long enough and close enough to the present as to afford data on a relatively large number of recently developed drugs.
The data search was restricted to four of the most common and commonly studied solid tumor types, breast, colorectal, ovarian, and non-small cell lung cancers, to ensure that sufficient data would be available.
The Medline and CancerLit databases were used for the collection of published data. In an attempt to minimize publication bias, both paper publications (peer reviewed) and meeting abstracts (nonpeer reviewed) were used as sources of information. If published data were not available for identified drugs, manufacturers were contacted for unpublished data.
Selection of Drugs
Drugs were identified by searching the Medline and CancerLit databases for compounds that had undergone single agent Phase I clinical trial testing either in 1991 or 1992. Agents with novel targets such as signal transduction or angiogenesis modulators were not included.
This Phase I-based approach to agent identification was used to ensure selection of agents developed within the study time frame of 1985–2000: agents with a published Phase I clinical trial in 1991 or 1992 were expected to have been through preclinical testing between 1985 and 1990 and to have undergone Phase II clinical evaluation by the year 2000. In addition, this approach was adopted to minimize publication bias: publication of Phase I trials is generally less dependent on the observation of favorable tumor responses than publication of Phase II trials or of preclinical cancer model experiments.
Data Collection and Drug Activity
Phase II Clinical Trials.
Phase II clinical trials for each drug were identified by searching the Medline and CancerLit databases for scientific papers, reviews, or meeting abstracts. Duplicate publications were discarded. For trials with only abstract information, an additional search by author and/or institution name was conducted in Medline or CancerLit. Scientific papers were used in preference to abstracts, where possible.
Two restrictions were applied. The first was a geographic restriction: to ensure uniform methodology in trial conduct and RR assessment, only Phase II trials conducted in the Americas, Western Europe and Australia were included in the analysis. The second restriction referred to the treatment population and aimed to ensure that uniformly responsive populations of patients would be considered. For breast and ovarian cancer, only Phase II trials that included patients who had received prior chemotherapy for metastatic disease were used, whereas for NSCLC and colon cancers, the Phase II trials selected included patients who had received no prior chemotherapy.
For each individual Phase II trial the following information was collected: disease site; previous chemotherapy; disease stage; number of patients entered; eligible; evaluable and evaluable for response; number of complete and partial responses; and criteria used for response (standard WHO versus other). Trials had to have enrolled a minimum of 14 patients, at least 12 of whom must have been evaluable for response. Completed Phase II trials for which >20% of entered patients were listed as inevaluable for response were considered methodologically unacceptable and were not used. For trials in progress at the time of reporting (meeting abstract format only), the available data were used even if they represented <80% of the enrolled patients, provided that they met the 14-patient criterion. If a trial publication did not specify the previous chemotherapy treatment status of patients, it was not used. Information from Phase I-II trials was used only when the Phase I and II components of the trial were separately conducted and reported. Phase II information was collected regardless of drug dose and route of administration.
For a given drug, in a given cancer type, the activity in a single Phase II clinical trial was recorded as the RR: the number of partial and complete tumor responses over the total number of patients evaluable for response. The number of evaluable rather than eligible patients was used to accommodate information from trials for which final results were not available. In the very few cases where the number of patients evaluable for response was not provided, the number of evaluable patients, the number of eligible patients, or the number of patients entered in the trial (whichever was provided by the investigators) in that priority order was used.
To obtain a drug’s overall clinical activity in multiple Phase II trials of patients with the same tumor type, all responses and the collective number of patients evaluable for response were pooled from individual trials to calculate an overall RR. Finally, to get the Phase II activity for any three or four cancer types combined, the individual tumor RRs were averaged.
Human Xenografts and Mouse Allografts.
The search strategy for mouse cancer model data were similar to the Phase II process. The only exclusion in this case were results obtained with mouse tumors that were engineered to have special characteristics such as, for example, overexpression of proteins conferring drug resistance.
For each murine allograft or human xenograft, numerical value(s) of activity for drugs of interest was retrieved only if expressed as the treated over control tumor volume ratio (T/C%) or the tumor volume growth inhibition ratio (GI%; and T/C% = 100% − GI%) in the literature sources. In addition, only T/C% values calculated by the formula T/C% = [(RVtreated)/(RVcontrol)] × 100% were collected (where RV = relative volume), whereas T/C% values defined for regressions [T/C% = [(RVtreated (0) − RVtreated(t))/RVtreated (0)] × 100%] were excluded to ensure uniform calculation methods. If the T/C% was not provided but a relative tumor growth curve was given as a figure in a publication, the numerical values for the treatment and control groups provided in this graph were used to calculate the T/C%. Activity reported as all mice cured or 100% complete responses was considered equivalent to and recorded as a T/C% = 0. If no exact T/C% value was given but an interval of values was provided instead (i.e., T/C% >42), a T/C% equal to the interval midpoint value (i.e., a T/C% = 71) was assigned. Finally, where preclinical activity was reported as GI%, it was converted to T/C% by the formula T/C% = 100% − GI%. The activity value for the most effective, nontoxic dose in each schedule was recorded.
Single tumor type preclinical activity of each drug in the murine allograft or human xenograft models was defined as the mean T/C% value from all tested allografts/xenografts of that tumor type. Where the same laboratory had tested a single xenograft/allograft with multiple schedules of the same drug and/or where the same xenograft/allograft had been tested with the same drug by more than one laboratories, T/C% values for a single tumor were obtained by first averaging the same laboratory T/C% values and then the same xenograft T/C% values.
Overall preclinical activity in xenografts/allografts for all four tumor types together was expressed as the average of single tumor mean T/C% values.
In Vitro Human Tumor Cell Lines.
The publicly available data from the NCI’s Human Tumor Cell Line Screen was used as the information source for the in vitro tumor cell line model. Information from the NCI in vitro Human Tumor Cell Line Screen was favored because it was a readily available, well-defined, comprehensive, validated, and extensive single source of data. Another important reason was that as an exploratory literature search showed, there was such a wide variation between different investigators in the types of assays used and the nature of cell lines tested that it would have been impossible to comprehensively combine published data from various laboratories.
Acquisition of NCI Human Tumor Cell Line Screen data were done through the internet.4 Information for each drug was obtained through its NCI code number or NSC number. Such numbers, where available, were identified either from the literature or from a cross-reference of compound names and NSC numbers in the NCI database (also available on the NCI web site).4
Testing of compounds in the NCI in vitro Human Tumor Cell Line Screen has been described previously (17) . Briefly, growth inhibition in cell lines is measured by the GI50, defined as the drug concentration that causes a 50% reduction in cell number in test plates relative to control plates. For every drug entering the screen, a concentration range comprised of five, 10-fold dilutions is tested in each of a group of 60–80 cell lines. The optical densities between treated and control plates, as resulting from the sulforhodamine B assay, are used to construct a dose-response curve for each cell line in the screen, leading to the calculation of a GI50 in every case by interpolation. In the case of compounds with low (i.e., the highest concentration tested causes <50% growth inhibition) or high (i.e., the lowest concentration tested causes >50% growth inhibition) potency where interpolation is not possible, the highest and lowest concentrations, respectively, in the tested drug concentration range are recorded as the approximated GI50s. GI50s are then converted to their Log10 values and the overall mean Log10GI50 across all cell lines in the screen is calculated. Finally, the results are displayed by a bar graph called the mean graph (28) . This graph lists all of the cell lines and their corresponding Log10GI50s and relates the magnitude of every individual cell line Log10GI50 to the mean Log10GI50 across all of the cell lines by a bar to the right (more sensitive than average) or to the left (less sensitive than average) of a vertical line. The experiment is repeated several times for each concentration range. In cases where mean graphs are based on mostly approximated GI50s, other higher or lower concentration ranges of the drug (again made of five, 10-fold dilutions) are also tested. Thus, for each compound tested in the NCI in vitro Human Tumor Cell Line Screen, multiple GI50 mean graphs (one for each concentration range tested) based on multiple experiments each and with a different content of approximated versus calculated (by interpolation) GI50s may exist in the NCI database.
We obtained all of the available GI50 mean graph information from the NCI web site for all drugs in our list of compounds with known NSC numbers.4 For every drug, we recorded the number of concentration ranges tested in the NCI in vitro Human Tumor Cell Line Screen, the number of experimental repetitions conducted for each concentration range, and, finally, the number of approximated Log10GI50s in each mean graph.
The drug concentration range that produced the mean graph with the smallest number of approximated Log10GI50s was used for scoring a drug’s activity in the NCI in vitro Human Tumor Cell Line Screen, unless a different concentration range existed, with a number of approximated Log10GI50s varying <10% from the first but for which more experiments were done.
Preclinical activity in the NCI in vitro Human Tumor Cell Line Screen was scored in two different ways: by the mean Log10GI50 and by what was termed the activity fraction. For a given drug, in a given tumor type, the mean Log10GI50 was computed by averaging the Log10GI50s from all of the cell lines of that tumor type in the mean graph corresponding to the most appropriate concentration range. The activity fraction was arbitrarily defined as the number of cell lines of a given tumor type in which the individual Log10GI50s were more sensitive to the drug than the average Log10GI50 (for all cell lines of all cell types) in the mean graph over the total number of cell lines tested from that tumor type. The activity fraction was also calculated from the mean graph corresponding to the most appropriate concentration range. Overall mean Log10GI50s or activity fractions for all four cancer types combined were calculated by averaging the single tumor values.
For each preclinical cancer model, 9 Phase II versus preclinical activity relationships were examined for a total of 27: relationships by tumor type (disease-oriented approach, 4 relationships/model), predictive ability of one tumor type for the other three tumor types combined (compound-oriented approach, 4 relationships/model), and general predictive ability for all four tumor types combined (1 relationship/model).
Relationships were first examined descriptively with the construction of various Phase II overall activity versus preclinical activity scatter plots (Microsoft Excel software). Each point on these scatter plots represented data from one drug for which both Phase II and preclinical activity values had been calculated from literature sources, as described above.
After descriptive evaluation of the data, Spearman rank correlation coefficients were obtained using the SAS software, UNIX version 6.12. A significance test of every correlation coefficient was performed, and the corresponding Ps were calculated. Spearman rank (nonparametric) correlation coefficients were used because the distributions of the x (preclinical activity) and y (clinical activity) variables were not normal (29) .
When multiple comparisons are made within a group of data such as in this work, there is increased possibility that some correlations will come up as statistically significant solely because of chance (false positives). To avoid this, multiple comparison correction methods (e.g., Bonferroni approach) are often used to adjust the significance level to a lower P than conventionally used. However, relying on corrected probabilities increases the possibility that meaningful correlations will be missed (false negatives), making the nature of the scientific work key to the decision to use multiple comparison adjustment methods or not. Because this was an exploratory study, we were willing to accept a higher probability of false positives to ensure that potentially meaningful associations would not be discarded. We therefore did not correct for multiple comparisons and chose a level of significance of 0.05.
The Medline and CancerLit databases were searched for cancer drugs (excluding agents with novel targets such as signal transduction or angiogenesis modulators) that had undergone single agent Phase I clinical trial testing either in 1991 or 1992. This search led to 97 drug names. After excluding drugs that were eliminated from additional clinical testing for practical reasons (for example difficulties with the drug formulation), drugs that were specifically developed for a certain type of cancer (as for example hormone-regulating compounds for breast cancer) and drugs that were still the subject of published Phase I studies in 1991 and 1992 despite already being licensed for human use before 1985, a list of 31 agents was obtained (Table 1)⇓ . After applying the restrictions and criteria mentioned under “Materials and Methods,” we extracted from the literature preclinical and Phase II activity information for those agents on four common cancer types, breast, NSCLC, ovary, and colon. Overall, 100 preclinical and 307 Phase II clinical literature references were used spanning the period between 1985 and 2000.
No preclinical data were found for 5 of the 31 drugs researched. Of the 26 drugs remaining, availability of preclinical and Phase II data varied, depending on which preclinical and clinical tumor(s) had been tested and published in each case. Thus, each of the relationships examined had a different number of data points as different subsets of drugs were included. The most data points for any relationship were 17. For six relationships, five or fewer data points were available (relationships with fewer than five data points were not included in the results presented below).
In Vitro Cell Line Model.
Fig. 1⇓ shows the Phase II activity versus preclinical activity scatter plots and correlation analysis for the in vitro cell line model when the mean Log10GI50 was used as the measure of preclinical activity. Because the lower the mean Log10GI50, the higher the potency of a drug, a negative correlation between mean Log10GI50 and Phase II overall RR was expected if the model had a good clinical predictive value. Significant negative correlations were found for NSCLC (Fig. 1A)⇓ , for breast or ovarian cell lines versus overall Phase II activity in the other three tumor types (Fig. 1B)⇓ , and for preclinical activity versus Phase II activity in all four tumor types (Fig. 1C)⇓ .
Although the trends observed with the activity fraction were similar to ones seen for the mean Log10GI50 measure, no correlations were statistically significant in this case (data not shown).
Human Xenograft Model.
A negative correlation between Phase II RRs and mean T/C% values was expected to be indicative of a good clinical predictive value for the human xenograft model. As shown in Fig. 2⇓ , no significant correlations between preclinical and clinical activity were observed for this model in our analysis.
For some of the drugs, preclinical activity calculations were based on multiple human xenografts of the same tumor type (i.e., panels) while for others on only a single xenograft. The relationships in Fig. 2⇓ were reanalyzed, including only the drugs for which preclinical information on more than one human xenograft was available (Fig. 3)⇓ . The results did not change for breast or colon tumors (compare Fig. 3A⇓ with Fig. 2A⇓ ). However, the relationship for NSCLC became statistically significant and a highly significant correlation was seen for ovarian cancer (Fig. 3A)⇓ . A near significant correlation was obtained when ovarian human xenograft panels were used to predict clinical activity in the other three tumor types combined (Fig. 3B)⇓ .
No significant correlations between preclinical and clinical activity were observed for any of the relationships examined in this study for the murine allograft model (data not shown).
The scatter plots in Fig. 1⇓ revealed an interesting observation: in every relationship except for colon cancer under the disease oriented approach, an obvious trend toward a negative correlation was evident except for one to three outlier data points (Fig. 1⇓ , arrows). Interestingly, in all cases, these outlier data points corresponded to the same three drugs, namely elsamitrucin, didemnin B, and rhizoxin.
In an attempt to provide a possible explanation for this observation, we considered the mechanism of action of all drugs that were included in the correlations in Fig. 1⇓ . From a total of 18 drugs (Table 2)⇓ , 5, namely, elsamitrucin, didemnin B, rhizoxin, flavone acetic acid, and fosquidone, were distinct in that they seemed to act through mostly unknown pathways that were not the typical DNA-based mechanisms of action of cytotoxic cancer agents. Thus, although flavone acetic acid and fosquidone fitted the rest of the data, there seemed to be a plausible mechanistic basis for the outlier behavior of the data points for elsamitrucin, didemnin B, and rhizoxin. In fact, exclusion of these three drugs led to highly significant correlations in all cases except for the same tumor relationship in colon cancer (Fig. 1⇓ , correlation coefficients and Ps for “w/o arrows”). It should be noted that none of the relationships examined for the human xenograft models (Figs. 2⇓ and 3⇓ ) included elsamitrucin, didemnin B, or rhizoxin as data points.
Because of the intriguing results obtained with the human NSCLC and ovarian xenograft panels in Fig. 3A⇓ , a more detailed examination of these panels was pertained. As seen in Figs. 4A⇓ and 5A⇓ , the 6 ovarian and 7 NSCLC xenograft panels differed both in the numbers (minimum of 6 and maximum of 13 for ovary and minimum of 2 and maximum of 8 for NSCLC) and the identity of the xenografts that they contained. Analysis by grade/histology was hindered by lack of complete information on all xenografts. However, some patterns appeared distinguishable. All ovarian panels contained 10–20% undifferentiated tumors and also included both poorly differentiated and moderately differentiated subtypes (Fig. 4B)⇓ . For NSCLC, all panels included adenocarcinoma xenografts with a frequency of >30% (Fig. 5B)⇓ . These observations suggested that the frequency of histological/grade subtypes within a xenograft panel may be an important determinant of clinical predictivity rather than the number or the nature of the xenografts.
In an attempt to explore this hypothesis and to further examine the validity of the results obtained for ovarian cancer and NSCLC in Fig. 3A⇓ , the literature was reviewed for additional data. Six more agents with known overall Phase II RRs in previously treated patients with ovarian cancer were found. Five and one of these compounds had been tested in a panel of 15 and 6 human ovarian xenografts, respectively (26 , 30) , which fitted the histology/grade patterns identified in Fig. 4B⇓ . Fig. 6A⇓ lists the names and Phase II RRs (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56) of these additional drugs together with the six compounds that were included in the analysis in Fig. 3A⇓ . Fig. 6, A and B⇓ , also shows mean T/C% values scatter plots and statistical analyses for two cases: first, for when all of the available xenograft information was used, and second, for when mean T/C% calculations were based, where possible, on the arithmetically smallest panel, namely the one used for gemcitabine in Fig. 4⇓ . Highly significant correlations were obtained in both cases (Fig. 6B)⇓ .
For NSCLC information on two additional agents was found: amsacrine [mean T/C% of 62 (26) and Phase II RR equal to 0.06 (31)] and doxorubicin [mean T/C% of 47 (26) and Phase II RR equal to 0.12 (32)] . Both had been tested in NSCLC human xenograft panels that included all three histological subtypes and had adenocarcinoma contents of 29 and 33%, respectively. As for ovarian cancer, those two additional data points (Fig. 5C⇓ , arrows) enhanced the statistical significance of the relationship observed in Fig. 3A⇓ .
A literature-based, retrospective study was conducted to examine the clinical predictive value of three widely used preclinical cancer models, namely, the in vitro human tumor cell line, the human xenograft, and the murine allograft models. Four solid tumor types were selected, breast, NSCLC, ovary and colon, and data on a set of 31 anticancer agents (excluding agents with novel targets such as signal transduction or angiogenesis modulators) were collected. Preclinical activity in each model was correlated with RRs in Phase II clinical trials by tumor type (disease-oriented approach) in the case when one preclinical tumor type was used as a predictor of overall clinical activity in the other three tumor types combined (compound-oriented approach) and for all four tumor types together.
Colon cancer was the only site for which a disproportional amount of clinically active versus inactive agents were identified: only 3 drugs with Phase II RRs > 0.15 and 8 with ≤0.10 (Figs. 1⇓ 2⇓ 3⇓ ). However, this was likely a reflection of the lack of clinically effective drugs for this tumor type rather than the result of selection and publication bias.
When the mean Log10GI50 measure of preclinical activity was used, the in vitro cell line model was found to be predictive of Phase II clinical performance for NSCLC under the disease-oriented approach in breast and ovarian cancers under the compound-oriented approach and in the case of all four tumor types together. Highly significant correlations were observed in all cases, except colon cancer, when three consistent outlier data points corresponding to the mechanistically nontypical cytotoxic agents didemnin B, elsamitrucin, and rhizoxin were excluded in exploratory analysis. Thus, the in vitro cell line model might be predictive in the case of typical cytotoxic cancer agents but might fail to provide reliable information for at least some of the noncytotoxic cancer drugs. Additional studies are needed to explore this observation.
The fact that drug potency (mean Log10GI50), a pharmacological measure, was found to be predictive of Phase II performance was somewhat surprising but has been noted previously: a recent study by Johnson et al. (18) demonstrated a highly significant correlation between potency in the NCI human tumor cell line screen and activity in the hollow fiber assay. Pharmacological considerations (pharmacological differences between the species) might provide a possible explanation why some anticancer agents appear effective in in vivo mouse models but fail to show efficacy in Phase II trials. Experience with some agents (57) has shown that the maximum-tolerated dose in mouse can be higher than in humans, presumably because of an intrinsic ability of mouse cells to tolerate higher drug doses and/or more efficient elimination in the mouse.
In contrast to the in vitro cell line, our results suggest that the murine allograft model, as used in this analysis, is not predictive of clinical Phase II performance. This is in agreement with the conclusions from a large body of information originating from the NCI screening programs in use from 1975 to 1990 (5, 6, 7, 8 , 10, 11, 12) .
The human xenograft model showed good tumor-specific predictive value for NSCLC and ovarian cancers when panels of xenografts were used. However, it failed to adequately predict clinical performance both in the disease and compound-oriented settings for breast and colon tumors. The results with breast cancer were in agreement with a recent study (18) but were contradictory to the work reported by Bailey et al. (20) , Inoue et al. (21) , and Mattern et al. (24) . However, given that the latter studies did not use formal statistical methods, our conclusions may be more robust. The results for ovarian cancer were in agreement with studies by Taetle et al. (23) and Mattern et al. (24) but contradicted the conclusions of the recent NCI United States study by Johnson et al. (18) . Our results for NSCLC were consistent with the observations from all previous studies that examined same tumor correlations in this cancer type (18 , 24) .
For NSCLC and ovarian cancer patients, a panel of xenografts was more predictive than single xenografts confirming preliminary observations by Bellet et al. (19) .
In an effort to identify the properties that may render an ovarian or NSCLC human xenograft panel predictive of Phase II drug performance, common characteristics were sought. There was no similarity in number and only limited overlap in identity of xenografts between same tumor type panels. However, certain patterns in histology/grade content were found. These observations suggest that the relative histology/grade content rather than the number or identity of xenografts within a panel may be the important determinant of clinical predictivity. To our knowledge, no other study has attempted to identify ovarian or NSCLC human xenograft panel features that might lead to accurate predictions of a drug’s Phase II performance.
This is the only study that has examined the clinical predictive value of three preclinical cancer models together and thus allows for direct comparisons between them. The results suggest that the human xenograft model is more predictive than its murine allograft counterpart and that the in vitro cell line model is of, at least, equivalent usefulness to the human xenograft model.
The NCI work with cancer drug screening programs from 1955 to 1990 (Refs. 5, 6, 7, 8 , 10, 11, 12 ; leukemia-based preclinical, compound-oriented screens preferentially yielding compounds active against hematological malignancies) in combination with our work and recent conclusions by Johnson et al. (Ref. 18 ; statistically significant results under the compound-oriented approach for some solid tumors) suggest that the compound-oriented strategy may be successful when used only within solid tumors or only within hematological malignancies but not when the two disease groups are considered together.
In general, our results suggest that the in vitro human tumor cell line and the human xenograft models might have good clinical predictive value in some solid tumors (such as ovary and NSCLC) under both the disease and compound-oriented strategies, as long as an appropriate panel of tumors is used in preclinical testing.
In conclusion, given the results in this study and those of others (6 , 7 , 10, 11, 12) , continued use of the murine allograft model in drug development may not be justified. The work presented here argues for emphasis to be placed on in vitro cell lines (in the context of the NCI Human Tumor Cell Line Screen) and appropriate panels of the human xenograft model.
Recent years have seen an explosion in the molecular understanding of cancer, which has led to the development of not only more effective cytotoxic cancer drugs but of potentially cytostatic or antimetastatic agents as well. The future preclinical and clinical development of traditional cytotoxic compounds will likely follow similar procedures with those practiced today, and in that sense, the present findings could contribute to the more efficient discovery of such agents. However, the existing cancer models and parameters of activity in both the preclinical and clinical settings may have to be redesigned to fit the mode of action of the novel cytostatic, antimetastatic, antiangiogenesis, or immune response-modulating agents (58) . In the preclinical cancer model front, the case is being made for the use of the orthotopic mouse xenograft and transgenic models (59, 60, 61) because those are thought to more accurately simulate human disease, especially in terms of growth characteristics and metastatic behavior. New end points of preclinical activity are contemplated such as the demonstration that a new molecule truly hits the intended molecular target (58) . In Phase II clinical trials, there is a growing effort toward validating new surrogate endpoints of drug efficacy (58) . The next decade will probably answer many of the questions regarding the effectiveness of these novel agents and will likely define a new role for traditional cytotoxic therapies, but it will also bring new challenges in terms of preclinical predictors of activity.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
↵1 Supported in part by the National Cancer Institute of Canada, Clinical Trials Group. Presented in part at the 2002 Annual Meeting of the American Society of Clinical Oncology (Abstract 360).
↵2 To whom requests for reprints should be addressed, at National Cancer Institute of Canada Clinical Trials Group, Cancer Clinical Trials Division, Cancer Research Institute, Queen’s University, 10 Stuart Street, Kingston, Ontario, K7L 3N6 Canada. Phone: (613) 533-6430; Fax: (613) 533-2941; E-mail:
↵3 The abbreviations used are: RR, response rate; NCI, National Cancer Institute; NSCLC, non-small cell lung cancer; NSC, National Service Center; T/C%, treated over control tumor volume ratio.
↵4 Internet address: http://www.dtp.nci.nih.gov/docs/cancer/searches/cancer_open_compounds.html.
- Received March 26, 2003.
- Revision received June 3, 2003.
- Accepted June 4, 2003.