Hierarchical Clustering Analysis of Tissue Microarray Immunostaining Data Identifies Prognostically Significant Groups of Breast Carcinoma

Prognostically relevant cluster groups, based on gene expression profiles, have been recently identified for breast cancers, lung cancers, and lymphoma. Our aim was to determine whether hierarchical clustering analysis of multiple immunomarkers (protein expression profiles) improves prognostication in patients with invasive breast cancer. A cohort of 438 sequential cases of invasive breast cancer with median follow-up of 15.4 years was selected for tissue microarray construction. A total of 31 biomarkers were tested by immunohistochemistry on these tissue arrays. The prognostic significance of individual markers was assessed by using Kaplan-Meier survival estimates and log-rank tests. Seventeen of 31 markers showed prognostic significance in univariate analysis (P ≤ 0.05) and 4 markers showed a trend toward significance (P ≤ 0.2). Unsupervised hierarchical clustering analysis was done by using these 21 immunomarkers, and this resulted in identification of three cluster groups with significant differences in clinical outcome. χ2 analysis showed that expression of 11 markers significantly correlated with membership in one of the three cluster groups. Unsupervised hierarchical clustering analysis with this set of 11 markers reproduced the same three prognostically significant cluster groups identified by using the larger set of markers. These cluster groups were of prognostic significance independent of lymph node metastasis, tumor size, and tumor grade in multivariate analysis (P = 0.0001). The cluster groups were as powerful a prognostic indicator as lymph node status. This work demonstrates that hierarchical clustering of immunostaining data by using multiple markers can group breast cancers into classes with clinical relevance and is superior to the use of individual prognostic markers.


INTRODUCTION
Breast cancer encompasses a highly heterogeneous group of tumors. Tumor size and grade and status of axillary lymph nodes have been established as prognostic indicators for breast cancer patients, with lymph node metastasis being the most powerful of these. Many other factors have been suggested as prognostic indicators, but none of them, individually, has consistently reached the prognostic significance of lymph node status in multivariate analysis (1).
Gene expression profiling has allowed stratification of breast tumors into clinically relevant groups through the use of expression levels of thousands of genes rather than single prognostic indicators. Novel prognostically relevant subsets of breast cancer that are not identifiable on routine light microscopy have been identified by this approach (2)(3)(4)(5)(6)(7). The objective of this study was to apply analytical techniques, developed for gene expression profiling, to determine whether multiple immunohistochemical prognostic markers could be used to identify prognostically relevant groups of breast cancer patients, and to determine the optimal panel of immunomarkers necessary to define those groups.

MATERIALS AND METHODS
Case Series. This study group comprised 438 women with primary invasive breast cancer who underwent surgery for breast cancer between 1974 and 1995 at Vancouver General Hospital. These were consecutive cases, and the presence of invasive breast carcinoma was the only selection criterion in this study. Outcome data were available for all of the patients, with median follow-up of 15.4 years (range, 6.3-26.6 years). The patients' ages ranged from 28 to 87 years with a mean age of 61 years. The patients were treated by mastectomy or lumpectomy with (392 patients) or without (46) axillary lymph node dissection. One hundred twenty-six patients had histologically confirmed lymph node metastasis, and 266 had no lymph node metastasis. Tumor size information, measured in millimeters, was available for 366 patients; 217 patients had tumors Յ20 mm, and 149 had tumors Ͼ20 mm. Adjuvant therapy varied substantially during the period 1974 through 1995, and no information on individual adjuvant treatment was available. Ethical approval was obtained from the institutional ethical review board to perform this study.
Tissue Microarrays and Immunostaining. Tissue microarrays (TMAs) were designed as described previously (8,9) by using two 0.6-mm tissue cores per case, taken from formalin-fixed, paraffin-embedded archival tumor blocks. All of the immunostains were done with standardized protocols. 6 The panel of antibodies consisted of those available in the Genetic Pathology Evaluation Centre that showed case-to-case variation within the series of breast carcinomas (i.e., none of the tumors showed identical or nearidentical staining patterns). The antibody choice was empirical, based on availability, suitability for paraffin-embedded archival material, and biological and clinical relevance to breast cancer, and was also partially driven by an attempt to reproduce the classification based on recent investigations in gene expression profiles in breast cancer (Table 1; Refs. 2-7). Once new antibodies were selected, each antibody was titrated with four to five different dilutions (with at least a 2-fold difference between each dilution) on the whole-mount tissue sections, according to the manufacturer's recommendation (Table 1). If there were no well-established positive-control tissues, we used our in-house multitumor array to find model positive-control tissue. If signal-to-background ratio was not acceptable for the dilution tested, the pretreatment/incubation time/ concentration were readjusted. Immunostains were scored semiquantitatively by two pathologists considering either cytoplasmic or membranous staining intensity, or percentage of positive nuclei (Table 1). Any discrepancies were resolved with a multihead microscope. The higher score was considered as a final score in case of a difference between duplicate tissue cores. Scoring was done without knowledge of patient outcomes.
Data Analysis. For each patient, the date of breast cancer diagnosis, date of last follow-up, and vital status at last follow-up (i.e., living or deceased) were recorded. For patients who died, the death due to breast cancer or death due to other causes was recorded, and the date of death was recorded as the last follow-up. For patients who were still alive, the date of last follow-up was the one recorded in their medical chart. We considered two outcomes: overall survival and disease-specific survival. Both outcomes were measured from a patient's date of the breast cancer diagnosis until her date of death. If a patient was alive at the date of last follow-up, the outcomes were censored. For patients who Univariate analyses were performed by using Kaplan-Meier estimates and log-rank tests, with raw score data obtained for each individual biomarker (according to the empirical score system described in Table 1). The binarization of score data for the purpose of correlational and multivariate analysis (i.e., determining "positive" versus "negative" staining results) was done with historically established cutoffs for 27 of 31 markers. For four biomarkers (IGFBP2, IGFBP5, TIP1, and HSP27) binary cutoff points were found by testing multiple combinations of ordinal score groups (based on a separate test of equality of all factor levels for each stratum), e.g., 0 versus 1, 2, 3 or 0, 1, 2 versus 3, but not 0, 2 versus 1, 3, and so forth) to measure the clinical effect of expression levels of these proteins to disease-specific survival. Clustering analysis was based on the complete dataset and not on the binary results.
Multivariate analyses were performed with a proportional hazard model (i.e., Cox regression) and a backward stepwise method to remove variables from the model. A significant difference was declared if the P value from a two-sided test was less than 0.05 and a near-significant result was declared if the P value was less than 0.2. We used unsupervised hierarchical clustering analysis to organize TMA score data (i.e., the results of immunostaining) into meaningful structures, applying the same approach that has previously been adopted for cDNA microarrays (10,11) and has also been applied to TMA data (9). Clustering analysis organizes cases according to the similarity or dissimilarity of immunostaining profiles, placing the cases with similar immunoprofiles together as neighboring rows in the clustergram. The relationship between cases and immunomarkers is depicted graphically as a dendrogram in which branch length is determined by the correlation between immunostaining results. All raw score data for each biomarker (as shown in Table 1) were used for clustering analysis. Only cases with immunostaining data for 80% or more of the markers under consideration were entered into clustering analysis. 2 tests were used to determine which markers contributed to the formation of cluster groups. The agreement in classification of cases based on different sets of immunomarkers was assessed with the kappa statistic. A kappa value of 0.41 to 0.6 indicates moderate agreement, 0.61 to 0.8 substantial agreement, and more than 0.8 near-perfect agreement (12). Software for TMA data management was used [Deconvoluter version 1.04, EXCEL for Windows; Liu et al. ( 11)]. Unsupervised hierarchical clustering analysis with average and complete linkage algorithms was done with GeneCluster, and graphical representation of the results of clustering was done with TreeView. 7 SPSS for Windows version 11.0 (Chicago, IL) was used for statistical analysis of the data.

Univariate Analysis of Basic Clinicopathologic Variables (Tumor Size, Nodal Status, and Tumor Grade) and
Immunostaining Data. Positive nodal status and larger tumor size (Ͼ20 mm) showed a highly significant correlation with poor outcome; Nottingham grade did not show a statistically significant correlation with outcome in our series, although tumors with higher grade showed a trend toward a worse outcome (Table 2).
A set of 31 biomarkers was tested by immunohistochemistry on the TMAs. A total of 10,041 (74.5%) of a maximum possible number of 13,454 paired tissue core sections produced interpretable immunostains and were scored semiquantitatively, as shown in Table 1. Missing immunostain data were usually caused by loss of a core from the section, less commonly by exhaustion of tumor material in the core. The list of 21 markers that were prognostically significant (P Յ 0.05) or approached significance (P Յ 0.2) in univariate analyses is shown in Table 2. The remaining 10 markers were not prognostically significant (P Ͼ 0.2). Examples of representative immunostaining results for two cases, with four immunomarkers per case, are shown as Fig. 1. Results of immunostaining with 11 immunomarkers per case, for all cases, can be viewed online at http://gpec.bliss.ubc.ca (under "view/Clustering Breast Cancer"). 8 Unsupervised Hierarchical Clustering Analysis of Immunostaining Data. Unsupervised hierarchical clustering analysis (average linkage method) was applied to the dataset of 31 immunomarkers. This did not produce a dendrogram with well-defined cluster groups of cases ( Fig. 2A). The ten markers, which did not show significance or trend toward significance in univariate analysis were then excluded from the clustering model. Clear separation of cases into three distinct groups with large linkage distances was apparent when clustering was done with the 21 markers listed in Table 2 (Fig. 2B). To determine the stability of the cluster groups and to possibly reduce the number of markers needed to define cluster groups, we clustered the data by using, in turn, 20, 19 (Fig. 2C), 18, and 17 markers by stepwise exclusion of CD68, HSP27, p63, and CK5/6 [markers that showed a trend toward significance (P Յ 0.2) in univariate analysis]. Prognostically significant cluster groups were identified by clustering that was based on 20, 19, and 18 markers. The most significant survival differences between cluster groups were seen with the set of 19 markers (Table 2).
It is noteworthy that the removal of the basal-like breast cancer marker CK5/6 (13, 14) from the clustering analysis when the analysis went from 18 to 17 markers resulted in the identification of cluster groups with no significant differences in survival (P ϭ 0.18). CK5/6 did not quite reach prognostic significance in univariate analysis (P ϭ 0.06) but contributed significantly to accurate classification of cases into "good" and "bad" prognosis cluster groups. CK5/6-positive cases accumulated in cluster group 3, associated with the worst prognosis, as shown in Table 3.
Determination of a Minimal Immunomarker Set to Define Cluster Groups. As has been shown, the cluster groups produced by 19 markers were of highest prognostic significance, when compared with the cluster groups formed by 21, 20, 18, and 17 markers. To define the minimal set of markers necessary to identify prognostically significant cluster groups, we performed 2 tests, where each marker was measured across the defined three cluster groups formed by the set of 19 markers. We found that Her3, Cox2, NSE, Relaxin, CD10, YB1, IG-FBP2, and IGFBP5 showed no significant differences in staining results among the three cluster groups, whereas the remaining 11 markers (ER, PR, Her2, p53, Ki67, CA IX, TIP1, stromal CD117, PTEN, p63, and CK5/6) showed statistically significant difference in distribution between the three cluster groups (Table 3). Hierarchical clustering was performed with the set of these 11 markers (Fig. 2D). The overall concordance between assignment of individual cases to one of the three cluster groups formed when 19 versus 11 markers were used, respectively, was near perfect (kappa ϭ 0.89). There were highly significant differences in disease-specific survival and overall survival among the 3 cluster groups defined with 11 markers (Fig. 3A and B). Cluster groups 2 and 3 were characterized by predominance of ER-and PR-negative breast cancers, whereas cluster group 1 consisted of tumors that were uniformly ER positive and PR positive in a majority of cases (Table 3). Accordingly, comparison was made between survival differences based on cluster group designation and ER/PR status (Fig. 3C and D). The differences in survival of patients in different cluster groups was greater than was seen with ER and/or PR alone. Also, there was a highly significant positive correlation between cluster groups and tumor grade (P Ͻ 0.001) but only a trend to correlation with tumor size (P ϭ 0.07). No correlation was observed with lymph node status or patient age.

Assessment of Reproducibility of Cluster Groups by Application of Complete Linkage Clustering Algorithm.
To assess variation in cluster group membership designation when using different hierarchical clustering methods, we applied complete linkage clustering algorithm to the data set of 11 markers. The concordance between designation of individual cases to one of three cluster groups using average linkage versus complete linkage clustering algorithms was near perfect (kappa ϭ 0.92), with only 9 mismatches of 269 paired cases.

Multivariate Analysis of Cluster Groups and Clinicopathologic Variables.
Multivariate analysis showed independence of cluster groups defined by 19 (as well as by a reduced set of 11) markers from nodal status and tumor size for both disease-specific survival and overall survival. The relative risk of death from breast cancer for patients in cluster group 1 versus groups 2 and 3, combined, was comparable with the relative risk associated with positive lymph node status (Table  4). Another multivariate model included cluster groups of 11 markers, ER, and Her2, and the clinicopathologic variables of lymph node status, tumor size, tumor grade, and patient age. After six cycles of iteration, when the least significant variables were eliminated, only cluster groups and lymph node status (P ϭ 0.001) remained of independent prognostic significance with respect to disease-specific survival. When overall survival was considered, cluster groups (P ϭ 0.002), lymph node status (P ϭ 0.039), and tumor size (P ϭ 0.008) remained of independent significance. None of the immunomarkers, used in isolation, achieved the prognostic significance of cluster groups in terms of disease-specific or overall survival.

DISCUSSION
Invasive breast cancer is the most common carcinoma in North American women (1), and the prognosis is favorable if it is detected at an early stage. Significant improvements in survival have been realized through the combined effects of mammographic screening and adjuvant treatment (15). Continuing improvement will require development of more refined and effective treatment strategies, and these will probably be based Each row, a single case; each column, a single immunomarker. Green, negative immunostaining (score 0); black, weak immunostaining (score 1); brown, moderately intense immunostaining (score 2); red, strong immunoreactivity (score 3); gray, missing data. The dendrorams on the left of each clustergram, the relatedness of the immunoprofiles of individual cases; the longer the horizontal dendrogram arm, the greater the difference in immunoprofiles between individual cases inside a cluster group. The small dendrorams on the top of each clustergram, the relatedness of the immunoprofiles of individual biomarkers; the longer the horizontal dendrogram arm, the greater the difference in immunoprofiles between individual biomarkers. Numbers and different colors, cluster groups 1 (purple), 2 (brown), and 3 (blue). Black long horizontal arms, unclassifiable cases. on advances in molecular biology and pathology (16,17). Numerous biological markers have been proposed to be prognostic markers in invasive breast carcinoma; however, stratification of tumors into prognostic groups to guide therapeutic decisions is based mainly on tumor stage and grade and on assessment of ER, PR, and Her2 (18). Lymph node status remains the most powerful prognostic indicator, and prognostication is especially problematic for lymph node-negative invasive breast cancer, in which a decision about the advisability of adjuvant chemotherapy requires accurate risk assessment (18).
The potential for combinations of prognostic markers to be superior to any single marker has been observed previously (19,20). Clinical oncologists have been using Tumor-Node-Metastasis (TNM) classification for decades (21), which is a form of supervised clustering analysis combining tumor size, lymph node status, and distant metastasis. Unsupervised hierarchical clustering analysis has been used to classify tumors based on mRNA expression levels of thousands of genes. Prognostically relevant cluster groups have been identified for breast cancer, lymphoma, and lung cancer with this approach (2, 3, 6, 16). There have been attempts to classify breast cancers based on hierarchical clustering analysis of immunomarker data, but the prognostic significance of the cluster groups identified is unclear because of limited available outcome data or none (22,23).
We applied unsupervised (i.e., without consideration of other histologic and clinical variables) hierarchical clustering analysis to immunostaining data and identified prognostically significant cluster groups. We were able to identify a minimal set of 11 biomarkers necessary to define these cluster groups. ER and PR status of tumors were the major determinants of cluster group designation in hierarchical clustering analysis; however, the prognostic value of these two markers in either univariate or multivariate analyses was much less significant than for the cluster groups formed by multiple immunomarkers. Although this establishes that consideration of multiple markers is superior to single markers, the basis for this superiority is important to consider. A small proportion of ER-and/or PRpositive tumors clustered along with ER-/PR-negative cases, which suggests that expression of multiple negative prognostic markers associated with tumor aggressiveness in these tumors overrides the effect of positive prognostic factors such as ER. All of the patients were treated at the British Columbia Cancer Agency according to provincial treatment guidelines; however, management varied, based on the year of diagnosis, and detailed treatment information is not available. This is a limitation of this study.
Sorlie et al. (2,24), van 't Veer et al. (3), and van de Vijver et al. (25) independently identified subtypes of invasive breast cancer based on independent sets of gene expression data. Sorlie et al. (2,24) were able to consistently identify basal-like, Her2overexpressing, luminal type and normal-breast-like tumor subgroups with, respectively, worst, poor, intermediate, and good prognosis with hierarchical clustering analysis. Nevertheless, it was not possible to classify a significant proportion (6 -36%) of breast cancers into these categories (2), which raised questions about further validation of this newly proposed classification system and its application to practice. Although cDNA microarray technology is widely used in cancer research, it is still far from clinical implementation because of the cost of the assay, the necessity of validation of initial findings, and the lack of standardization of protocols. Immunohistochemistry is routinely available in clinical laboratories and has proved to be a reliable and reproducible ancillary method in anatomic pathology. Application of TMAs has dramatically reduced the cost and time required for the testing of multiple biomarkers on large series of cases (26,27). TMAs have been validated as useful tools for the study of prognostic markers in breast cancer (28); the concordance between TMAs and whole-section immunostains had been consistently high (22,28,29), and the prognostic value of markers when assessed with TMAs has been as good as, or superior to, that when whole sections were used (30). Clinical application of clustering analysis of immunomarker data in breast cancer could allow combinations of prognostic markers to be used simultaneously, rather than the traditional approach of assessment of prognosis based on single clinical facts. This could be done, for less cost than is currently incurred in the assessment of ER, PR, and Her-2, by batching clinical cases onto small TMAs, run once weekly. Each clinical center could have its own "training set" of cases with monitored  Abbreviations: CI, confidence interval. * Group 1 versus 2 and 3 (as shown on Fig. 2C and D).
long-term outcome. New cases, then, could be added one at a time to this training set and clustered, allowing assignment to one of the previously defined prognostic groups. Another advantage of this approach is that, if better immunohistochemical markers appear, they can be easily added to this model [for instance, more specific basal markers than are currently available (13)] to potentially improve the training set. We conclude that it is possible to identify prognostic cluster groups of invasive breast cancer patients with maximal differences in survival, by the clustering of immunostaining data, based on a panel of 11 prognostic markers. Our data show that the application of a small panel of markers identifies prognostic profiles for individual cancers and is independent of the major clinical indicators of prognosis, i.e., tumor size and lymph node status. From a clinical perspective, it is of critical importance to determine whether there are specific patient populations for whom it is reasonable to avoid the administration of cytotoxic chemotherapy. Limited information is available to answer this important question (18). Classification of breast cancer cases into prognostic cluster groups may aid in the individualization of adjuvant therapy, especially for a group of patients with node-negative status. Validation of this approach will require testing a sufficiently large (population-based) series, to allow analysis of subsets of patients with unclear prognosis. Quantification and further standardization of immunohistochemical analysis will also be an asset for future studies.