Abstract
The ability to parse tumors into subsets based on biomarker expression has many clinical applications; however, there is no global way to visualize the best cutpoints for creating such divisions. We have developed a graphical method, the Xtile plot that illustrates the presence of substantial tumor subpopulations and shows the robustness of the relationship between a biomarker and outcome by construction of a two dimensional projection of every possible subpopulation. We validate Xtile plots by examining the expression of several established prognostic markers (human epidermal growth factor receptor2, estrogen receptor, p53 expression, patient age, tumor size, and node number) in cohorts of breast cancer patients and show how Xtile plots of each marker predict population subsets rooted in the known biology of their expression.
INTRODUCTION
In theory, associations between tumor biomarker expression and patient outcome should reveal the existence of biologically meaningful tumor classifications; however in practice, there is no universal method for discovering, assessing or displaying such associations. Consequently, studies generally group tumors into set divisions (e.g., quartiles, deciles), which fails to reflect the underlying biology of most markers. Microarray technology has magnified this problem by accelerating the discovery of tumor markers for which there is no known biological basis. In such cases, there is no foreknowledge about how (or whether) a marker parses a population into subsets.
Although there is a range of statistical methods for cutpoint selection (1 , 2) , we sought to produce a method that is comprehensive, based on traditional statistical tests, and yet intuitive for the oncologist. Here we describe our solution of cutpoint selection using a new method that we have named “Xtile.” Xtile plots provide a single, global assessment of every possible way of dividing a population into low, medium, and highlevel marker expression. Xtile data are presented in a right triangular grid where each point represents a different cutpoint. The intensity of the color of each cutoff point represents the strength of the association. The Xtile software allows the user to move a cursor across the grid and provides an “onthefly” histogram of the resulting population subsets along with an associated KaplanMeier curve.3 This type of graphical representation can provide insight into the biological nature of a marker (e.g., does it show a linear distribution relative to survival, does it reveal distinct subpopulations, or does it manifest a Ushaped relationship to outcome). Because it is statistically invalid to test multiple divisions and accept the best P value (3 , 4) , rigorous statistical evaluation is achieved by defining divisions in a “training set” and then validating them in a separate patient cohort (“validation set”). The Xtile software provides a method of dividing a single cohort into training and validation subsets for P value estimation when separate training and validation cohorts are not available. In addition, the software can perform standard Monte Carlo simulations (e.g., crossvalidation) to produce corrected P values to assess statistical significance of data assessed by multiple cutpoints.
MATERIALS AND METHODS
Cohort Design and Tissue Microarray Construction.
Tissue microarrays were constructed as described previously (5 , 6) . Paraffinembedded formalinfixed specimens of breast carcinoma were identified from the archives of the Yale University, Department of Pathology as available from 1962–1980. Two principal cohorts were used. The first consisted of 350 cases nodepositive breast carcinomas, which was randomly divided into equallysized training and validation sets. The second cohort consisted of 236 tumors (50% nodenegative, 50% nodepositive) similarly divided into training and validation halves. Complete treatment information was unavailable for either cohort; however, most of the nodepositive patients were treated with local radiation, and approximately 15% were given chemotherapy consisting primarily of Adriamycin, Cytoxan, and 5fluorouricil. The nodenegative patients were routinely treated with surgery and/or local radiation alone. Approximately 27% of the patients subsequently received tamoxifen (post1978). Representative regions of invasive carcinoma were selected for coring by a pathologist (R. L. C). These regions were taken from tumor areas that were not contaminated with normal stroma. Because prior studies have shown that a single core adequately represents the staining pattern of an entire slide, all studies were done with a single sample of each tumor (7 , 8) . All patients were followed until death or for a minimum of 30 years. Patients were deemed “uncensored” if they died of breast cancer within 30 years of their initial date of diagnosis. Patient information was collected under an approved protocol from the Yale Human Investigation Committee.
Immunohistochemistry and Analysis.
Quantitative analysis of expression was done on the tissue microarrays with the automated quantitative analysis (AQUA) method of analysis described previously (9) Briefly, antibodies for human epidermal growth factor receptor2 (HER2; 1:8,000, polyclonal), estrogen receptor (ER, 1:500, monoclonal), p53 (1:50, monoclonal), and cytokeratin (1:200, AE1/AE3 monoclonal and polyclonal) were obtained from DAKO Corp. (Carpinteria, CA). Primary antibodies were applied for 1 hour followed by an horseradish peroxidaseconjugated secondary (Envision, DAKO Corp.). Tumor cells were identified with an anticytokeratin antibody, followed by an Alexa488conjugated antimouse or antirabbit secondary (1:200, Molecular Probes, Eugene OR). 4′,6Diamidino2phenylindole was added to visualize nuclei. HER2, ER, and p53 were visualized with a fluorescent Chromagen (Cy5tyramide, NEN Life Science Products, Boston, MA). Cy5 (red) was used because its emission peak is well outside the greenorange spectrum of tissue autofluorescence. Monochromatic, highresolution (1024 × 1024 pixel, 0.5 μm resolution) images were obtained of each histospot. Areas of tumor were distinguished from stromal elements by creating a mask from the cytokeratin signal. Coalescence of cytokeratin at the cell surface helped localize the cell membranes, and 4′,6diamidino2phenylindole was used to identify nuclei. HER2 signal from the membrane area of tumor cells was scored on a scale of 0 to 255 and expressed as signal intensity divided by the membrane area. ER and p53 signal from nuclei were similarly scored and expressed as signal intensity divided by nuclear area; thus, all scores are expressed on a scale with three significant figures. For arrays stained for visual (manual) analysis, 3,3′diaminobenzidine was used in place of Cy5tyramide.
Construction of XTile Plots.
Xtile plots are created by dividing marker data into three populations: low, middle, and high (i.e., two divisions). All possible divisions of the marker data are assessed. Associations can be calculated at each division by a variety of standard statistical tests, including the logrank test for survival and means tests for associations between other marker data. The data are represented graphically in a righttriangular grid where each point (pixel) represents the data from a given set of divisions. The vertical axis represents all possible “high” populations, with the size of the high population increasing from top to bottom. Similarly, the horizontal axis represents all possible “low” populations, with the size of the low population increasing from left to right. Data along the hypotenuse represent results from a single cutpoint that divides the data into high or low subsets. Data points away from the hypotenuse up or to the left represent results from two cutpoints that define an additional “middle” population in addition to the high and low subsets. The size of the middle subset increases with greater distances from the hypotenuse. Specifically, a χ^{2} value is calculated for every possible division of the population shown on the grid using a color code. Coloration of the plot represents the strength of the association at each division, ranging from low (dark, black) to high (bright, green, or red). Inverse associations between marker expression and survival (e.g., high expression connotes poorer survival) are colored red, whereas direct associations are colored green. The cursor can be manually moved over any cutpoint to reveal survival curves. Alternatively, the program can select the optimal division of the data by selecting the highest χ^{2} value. Statistical significance is assessed by using the cutpoint derived from a training set to parse a separate validation set, using a standard logrank test, with P values obtained from a lookup table. The calculations done by Xtile have been validated with Statview 5.0.1 (SAS Institute, Cary NC). The concept is best understood by using the interactive version of the software available for download.4
Generation of “Training” and “Validation” Cohorts.
Xtile creates separate training and validation cohorts by first making separate lists of “censored” and “uncensored” observations, ordered by followup time. Patients are alternately assigned to training and validation sets by reading down the list and selecting every other patient. This technique normalizes the base survival curve for both sets. It also ensures that the same training and validation sets are constructed each time an analysis of the same marker is done, thus preventing the possibility of obtaining different P values after reanalysis of the same data.
RESULTS AND DISCUSSION
To illustrate the utility of Xtile, we used a large, independent cohort, abstracted from the national SEER registry dataset (10) . This dataset includes information on over 160,000 breast cancer patients with a median followup time of 30 months, including diseaserelated survival, patient age, tumor size, nodal status, and node number. We analyzed three wellcharacterized parameters: tumor size, patient age, and node number, in Xtile plots (Fig. 1)⇓ . For tumor size, this analysis revealed a robust (P < 0.0001), linear association with patient survival, with no evidence of distinct subpopulations (Fig. 1A)⇓ . Thus, no matter where a cutoff is made, larger tumor size is associated with poorer survival. To assess statistical significance and avoid the problems of multiple cutpoint selection, we used the Xtile program to randomly divide the total cohort into two equal training and validation sets. The optimal cutpoints were determined by locating the brightest pixel on the Xtile plot of the training set. Statistical significance was determined by applying this cutpoint set to the validation set. Analysis of the survival data with Xtile reveals optimal cutpoints of (<1.9, 1.9 to 4.9, and >4.9 cm), which are almost identical to those empirically established for the staging of breast cancer (≤2, 2 to 5, and >5 cm; ref. 11 ). Histogram analysis of patient age shows a bimodal distribution, with two incidence peaks above and below 59 years of age (Fig. 1B)⇓ . Using the histogram as a guide, we divided the cohort into “old” and “young” subsets based on a cutpoint of 59 years; however, this division fails to result in a statistically significant difference in survival (Fig. 1B)⇓ . In contrast, Xtilebased optimal cutpoint selection analysis of patient age reveals two distinct populations, (<40 years) and (>82 years), which exhibit poor diseasespecific outcome relative to patients in the middle (40–82 years; Fig. 1C⇓ ). This relationship has been described previously (12) and is an example of a bimodal population that is robust but is neither readily demonstrable on a histogram nor discoverable with a single cutpoint analysis. It also provides an example of how Xtile can be used to discover optimal population cutpoints. Finally, we use Xtile to define the optimal cutpoint for the number of tumorinvolved lymph nodes in patients with nodepositive breast carcinoma (Fig. 1D)⇓ . As with tumor size, Xtile shows a diffuse continuous distribution with no discernable subpopulations. Xtile identifies the optimal division of the cohort into three populations (<4, 4 to 8, and ≥9). Again these divisions are remarkably similar to those established for breast cancer staging (<4, 4 to 9, and ≥10; ref. 11 ).
We then did Xtile analysis on tumor biomarkers using tissue microarray cohorts available in our laboratory. HER2, a member of the epidermal growth factor family is frequently overexpressed in breast cancers and is associated with poor prognosis. Xtile analysis of the productive ability of HER2 reveals a distinct subset of HER2 high tumors with poor outcome (Fig. 2AC)⇓ . In contrast, a similar plot of ER shows a linear, diffuse distribution of risk across the entire plot, with no evident subpopulations (Fig. 2D)⇓ . At virtually every division plotted, patients with ERhigh tumors survive longer than patients with ERnegative tumors.
These Xtile plots represent two distinct patterns, which provide insight into the underlying biology of HER2 and ER. In the case of HER2, gene amplification is common and often results in marked overexpression of HER2 at the protein level (13) . This biology is evident in the HER2 Xtile plot, which shows localized risk in a distinct subset of patients with HER2high tumors (Fig. 2AC)⇓ . The discreteness of this subset suggests that a 2way cut (high versus low expression) of the data is more appropriate than a 3way split. Consequently, we determined the best single cutpoint along the hypotenuse of the Xtile plot, which occurs as a HER2 AQUA score of >156.8 (test set/validation set P = 0.0069, Fig. 2A⇓ ).
In contrast, the expression of ER is highly complex and governed by a number of regulatory elements (14) . There is no evidence of ERgenetic mutations that would subset tumors based on either gene amplification or deletion. This biology fits with ER’s observed Xtile plot showing a diffuse distribution of risk (Fig. 2D)⇓ . Analysis of this cohort with a continuous univariate analysis of ER expression reveals a significant correlation with survival (P = 0.0017), demonstrating that ER, in this nodepositive cohort, is positively associated with patient outcome, when assessed as a continuous variable. Despite the continuous nature of the Xtile plot, we sought to determine the optimal division as defined by Xtile, which was found at AQUA scores of 37.7 and 117.6 (test set/validation set P = 0.0369). Interestingly, there is historical disagreement about how to divide tumors into ERpositive and ERnegative subsets. The continuous nature of ER would suggest that although one can determine a statistically valid cutpoint for ER, the marker should really be thought of as a truly continuous (linear) marker when assessing survival.
Finally, we use an Xtile plot of p53 to elucidate tumor novel tumor subpopulations. p53 is a tumor suppressor with complex biology and a varied record of predictive and prognostic significance in breast cancer. Mutation of both p53 alleles frequently leads to overexpression of nonfunctional forms of p53 in an attempt to compensate for the loss of p53 function. Such overexpression is often attributable to missense mutations in exons 5–8 and is associated with poor prognosis (15 , 16) . When assessed with standard immunohistochemistry, these tumors frequently exhibit intense p53 nuclear staining. In contrast, mutations outside of these exons generally result in undetectable levels of p53 protein (p53 null mutations; ref. 17 , 18 ). These tumors are therefore not detected by standard immunohistochemistry (19 , 20) . Theoretically, one should be able to distinguish among p53negative (null mutant), p53low (nonmutant), and p53overexpressing (nonnull mutant) tumors; however, such distinctions are beyond the sensitivity of traditional manually read “brownstain” immunohistochemistry.
We analyzed p53 expression on a training cohort of 118 breast cancers, 50% of which were nodepositive using AQUA. Correlations between p53 expression and patient survival were graphed on an Xtile plot, which clearly shows two distinct subpopulations of tumors, one of high expressers and one of low expressers, both of which do poorly relative to tumors expressing intermediate levels of p53 (Fig. 2E)⇓ . The Xtile plot reveals significant, optimal cutpoints of 7.5 and 46.4 (P = 0.0287). We speculate that the p53negative tumors are true doublenegative mutants for p53 and therefore comprise a poorprognosis subset. In contrast, the p53high tumors most likely represent nonnull mutants, which also have a poor outcome. In this case, Xtile has elucidated an intriguing subpopulation of tumors that calls for further biochemical investigation.
Although Xtile plots were designed for use with continuous data, they can also be used to find appropriate cutpoints in categorical (nominal) data such as that obtained by manual scoring of brownstained immunohistochemical slides. To show this, we manually scored our HER2 and ERstained tissue microarrays on a scale of 0 to 3+ and expressed this data in Xtile plots (Fig. 3A and B)⇓ . Note that each possible cutpoint is represented by a colored square. As with continuous data, the optimal cutpoint is determined by finding the square of the highest color intensity on a training set and then using that cutpoint on a separate validation set to test for statistical significance.
In addition to visualizing survival data, Xtile plots can be used display other correlative data, such as the association between two markers. For instance, Xtile can divide a cohort into subsets based on the expression of one marker and analyze the variance of the mean expression of another marker in each of these subsets. Figure 3C and D⇓ show the association between HER2 and ER expression in our cohort of breast cancer patients. In 3C, we used HER2 to divide the cohort and compared the mean ER expression of each subset using an ANOVA. As with the survival analysis shown in Fig. 2⇓ , the association between HER2 and ER expression shows a subset of high HER2expressing tumors that have low mean ER expression (Fig. 3C)⇓ . Interestingly, HER2 versus ER Xtile plot of training set data defines the optimal cutpoint between high and low HER2 expressers at an AQUA score of 156.8, identical to that obtained when survival analysis is used (Fig. 2)⇓ . This cutpoint is statistically significant (P = 0.0056) when assessed on a validation set. In contrast, using ER to divide the population and analyzing HER2 mean expression provides a more diffuse Xtile plot (panel D) where increasingly lower ER expression correlates with increasingly higher HER2 expression. The optimal cutpoint for this Xtile plot occurs at 45.7 (P = 0.0005). These examples illustrate that as with survival Xtile plots, correlative marker plots can also find biologically meaningful tumor subsets.
Recently, numerous algorithms have been developed in efforts to analyze results from DNA based microarrays. These include algorithms based on clustering (21) , genetic permutation (22) , or χ^{2} Automatic Interaction Detector (23) . Although these systems can analyze tumor subsets post hoc to assess differences in survival, none use survival time and censor information for the initial classification of tumor subsets. Furthermore, these algorithms generally treat marker data as continuous or nominalize data into nonbiological categories (e.g., quartiles or manual scoring on a 0–3+ scale). Our study would suggest that whereas this approach might be appropriate for ER, tumor size, or node number, it clearly is not appropriate for HER2, p53, or patient age, which exhibit nonlinear expression. Indeed, the actual HER2 expression level on a linear scale is far less important than the fact that a tumor has a HER2 expression level above a certain cutpoint. Although the calculations presented in an Xtile plot are not novel in and of themselves, performing a similar analysis with conventional techniques would require the performance of thousands of individual survival analyses and the exhaustive comparison of each to determine where an optimal cutpoint might occur.
In summary, Xtile plots present a new tool for (a) the assessment of biological relationships between a biomarker and outcome; and (b) the discovery of population cutpoints based on marker expression. Clinically, the ability to effectively define such subpopulations is important for the development of novel therapeutics that may be effective for only a limited subset of tumors. Furthermore, this method may have value in other fields (materials testing, actuarial questions, etc.) where cutpoint selection is complicated by timedependent assessment of outcome.
Acknowledgments
The authors thank Jay Cowan, Bonnie King, and Daniel Zelterman for help with this report.
Footnotes

Grant support: R. Camp is supported by NIH Grant K08 ES11571 and by a grant from the Breast Cancer Alliance. M. DolledFilhart is supported by the United States Army Breast Cancer Research Grant DAMD170310349. D. Rimm is supported by a grant from the Patrick and Catherine Weldon Donaghue Foundation for Medical Research, NIH Grant NCI R21 CA100825, and United States Army Grant DAMD17020463.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Requests for reprints: Robert L. Camp, Department of Pathology, Yale University School of Medicine. 310 Cedar St., New Haven, CT 06520. Phone: 2037856340; Fax: 2037857303; Email: robert.camp{at}yale.edu

↵3 A sample program is available for download at http://www.tissuearray.org/rimmlab/.

↵4 http://www.tissuearray.org/rimmlab/.
 Received April 13, 2004.
 Revision received July 13, 2004.
 Accepted July 28, 2004.