## Abstract

Advances in statistical modeling and analysis technology have improved our ability to derive valid inferences from tumor xenograft experiments. Further challenges in this area include the modeling of intertumor heterogeneity and the development of robust statistical models that describe key parameters in the underlying tumor biology. *Clin Cancer Res; 17(5); 949–52. ©2011 AACR*.

Commentary on Zhao et al., p. 1057

The tumor xenograft experiment is an essential tool of translational cancer science. The typical experiment involves implanting preparations of human tumor cell lines in the flanks of athymic mice. When the tumors have reached a designated target volume, the mice are divided into treatment groups, often a control group plus groups representing different doses or combinations. The growing tumors are then measured at intervals until the animals die, become moribund, or reach a planned time of sacrifice. The resulting data set consists of the vectors of measurements linked to the design information.

The report by Zhao and colleagues (1) in this issue of *Clinical Cancer Research* applies state-of-the-art statistical methods to the problem of analyzing tumor growth curves. Their Bayesian Hierarchical Changepoint (BHC) model assumes that treated tumors will shrink for a time, reach a nadir, and then regrow, with both the decline and the regrowth curves being linear on the log scale. The model allows for heterogeneity in the rates of decline and regrowth, the time until the nadir, and the tumor volume at the nadir. They estimate model parameters using the Bayesian paradigm, as implemented in the WinBUGS package. Happily, they have provided detailed code so that others who wish to apply such an analysis may do so with minimal effort.

So where does BHC fit in among other existing methods? If the idea is simply to establish statistical significance of between-group differences in growth profiles, numerous methods are available. The simplest involve analyzing tumor volumes at a prespecified evaluation time by ANOVA or its rank-based analog, the Kruskal-Wallis test. ANOVA is generally preferable for its greater efficiency, and it requires only that the data be roughly normally distributed (which can often be achieved by taking logs of the tumor volumes). A second common approach is to calculate for each tumor the time at which its volume has doubled (or perhaps tripled) from its pretreatment value; these data are then analyzed using methods from survival analysis such as the log-rank test. This method gives valid results but can be inefficient, particularly when a large fraction of tumors have not reached the endpoint (i.e., doubled or tripled) by the end of observation (2). Moreover, both this and the ANOVA method require some adjustment when there are multiple tumors per animal, as is common in xenograft studies (3). Fortunately, several methods are available for robustly incorporating within-animal correlation (4, 5).

A third approach is to fit lines, or more generally curves, to the data values in each group and to compare groups by testing equality of their curves’ coefficients (2). Methods from multivariate analysis, regression modeling, and mixed modeling are useful here. These methods use the entire data set and, therefore, are more powerful (i.e., more likely to generate statistical significance when there really is a difference) than simpler univariate methods. Moreover, such models generate information on slopes and other characteristics of growth curves that the analyst can mine for clues to the underlying biology. With the creation over the past 2 decades of flexible, reliable software for modeling correlated observations (e.g., SAS Proc Mixed), this type of analysis has become routine. The BHC model falls broadly within this category.

Farther along the analysis spectrum lie biology-based models that seek to represent the underlying biological processes mathematically, typically as systems of differential equations (6). The resulting models are nonlinear and may not possess closed-form solutions. Parameter estimates can be difficult to compute and sensitive to assumptions that cannot be robustly evaluated with the data at hand. Thus, although these models have the greatest potential to reveal biological information, they are often difficult to apply in practice even with the best available modeling tools (7).

However the data are analyzed, the very nature of tumor xenograft experimentation presents several challenging problems. First, most such experiments are small, including perhaps 8 to 10 animals per group with a like number of measurements per animal. Even when the animals are genetically identical, there can be substantial between-animal variability, perhaps arising from “nurture” effects or heterogeneity in the implanted tumor material or the application of treatments. Thus, although the data sets may seem large, the amount of information available for evaluating model adequacy (e.g., linearity of growth trajectories and specification of the error variance distribution) is modest. Second, data losses from animal morbidity and mortality are common, reducing efficiency and creating the potential for bias. Third, the phenomenon of tumor volumes dipping below detection levels further complicates analysis, as most methods assume that all volumes are known exactly. And finally, some experimental designs do not require the identification of individual animals, in which case all we know is the distribution of volumes at each time in each group rather than the individual sequences of volumes (8).

The BHC model represents a significant advance in its simultaneous description of differences between groups (fixed effects), heterogeneity between animals and tumors within animals (random effects), and subjective opinion (prior distributions on parameters). It also properly and effortlessly handles the problem of detection limits. Sampling-based Bayesian analysis, as implemented in WinBUGS, permits the investigator to evaluate a range of questions about the underlying model, such as quality of the fit, comparisons between curves, the amount of heterogeneity present, and so forth. The model of log-linear decline followed by log-linear regrowth is applicable to many practical experiments and is readily extensible to more complex situations, such as studies with multiple periods of decline and regrowth. An important further adaptation would include the situation in which some of the tumors shrink but do not regrow.

A possible criticism of the BHC and other “statistical” models is that model coefficients, however obviously valid as descriptors of the curves and as a basis for significance tests, may have limited biological relevance. For example, consider a compartmental model in which the untreated tumor grows exponentially with rate θ. Upon treatment, a susceptible fraction, constituting 100 × (1 − ρ)% of the cells in the tumor, is killed and removed, also exponentially, with rate −β. The surviving cells, 100 × ρ% of those alive at the time of treatment, continue to grow at rate θ. The volume curve from such a process would have the appearance of Fig. 1, and although one can easily write the curve in closed form, even so simple a model is difficult to fit. Although the BHC model would likely describe such data well, its parameters – the changepoint, the volume nadir, and the slopes on either side – are functions of the compartmental model parameters that have no compelling interpretation in terms of the biology.

Thus, the continuing challenge for scientists working in this area is to devise models that have both interpretable parameters and can be estimated reliably and robustly. Zhao and colleagues (1) have made an important advance, but there is more to do.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

- Received December 30, 2010.
- Accepted December 30, 2010.

- ©2011 American Association for Cancer Research.