
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Imaging, Diagnosis, Prognosis |
Authors' Affiliations: 1 Unit of Medical Statistics and Biometry, National Cancer Institute of Milano, 2 Institute of Medical Statistics and Biometry, University of Milano, Milan, 3 Institute of Pathology, University of Ferrara, Ferrara, and 4 Unit of Cancer Pathology, Department of Oncology and Neurosciences and "G. d'Annunzio" Foundation of the University of Chieti, Chieti, Italy
Requests for reprints: Federico Ambrogi, Unità di Statistica Medica e Biometria, Istituto Nazionale per lo Studio e la Cura dei Tumori, via Venezian 1, 20133 Milan, Italy. Phone: 39-2-2390-3282; Fax: 39-2-5032-0866; E-mail: federico.ambrogi{at}istitutotumori.mi.it.
| Abstract |
|---|
|
|
|---|
Experimental Design: Tumor biological profiles were explored on 633 archival tissue samples analyzed by immunohistochemistry. Five validated markers were considered, i.e., estrogen receptors (ER), progesterone receptors (PR), Ki-67/MIB1 as a proliferation marker, HER2/NEU, and p53 in their original scale of measurement. The results obtained were analyzed by three different clustering algorithms. Four different indices were then used to select the different profiles (number of clusters).
Results: The best classification was obtained creating four clusters. Notably, three clusters were identified according to low, intermediate, and high ER/PR levels. A further subdivision in two biologically distinct subtypes was determined by the presence/absence of HER2/NEU and of p53. As expected, the cluster with high ER/PR levels was characterized by a much better prognosis and response to hormone therapy compared to that with the lowest ER/PR values. Notably, the cluster characterized by high HER2/NEU levels showed intermediate prognosis, but a rather poor response to hormone therapy.
Conclusions: Our results show the possibility of profiling breast cancers by means of traditional markers, and have novel clinical implications on the definition of the prognosis of cancer patients. These findings support the existence of a tumor subtype that responds poorly to hormone therapy, characterized by HER2/NEU overexpression.
It should also be noted that very few breast cancer profiling studies have been based on large case series (5, 6). This problem is particularly critical in genetic/transcriptomic analysis studies that are typically characterized by small sample size (79). The very large number of the analyzed variables critically add to these difficulties.
The aim of this work was the profiling of breast cancers using a small number of molecular markers of biological and clinical importance. Different clustering techniques were used to improve the reliability of the conclusions reached and to assess the overall value of this strategy. To this purpose, data on a series of primary infiltrating breast cancers collected from 1983 to 1992 by the Pathology Department of the University of Ferrara were collected and analyzed.
| Materials and Methods |
|---|
|
|
|---|
Data on patient age, pathologic tumor size, histologic type, pathologic stage, and number of metastatic axillary lymph nodes were collected, as well as immunohistologic determinations of estrogen receptor (ER) status, progesterone receptors (PR) status, Ki-67/MIB-1 proliferation index (Ki-67), HER2/NEU (NEU), and p53 levels. The analysis was done on 633 cases for which complete information on all pathobiological variables was present (Table 1). The percentage of expression values of ER, PR, and NEU tended to distribute around the following values: 0%, 10%, 25%, 50%, 75%, and 100%, and were consequently discretized on these values. Percentages of Ki-67- and p53-expressing cells were analyzed without discretization, although they are reported in categories for convenience.
|
Immunostaining for ER, PR, Ki-67, and p53 was quantified with a Computerized Image Analysis System (CAS 200, Becton Dickinson, San Jose, CA; Fig. 1), as previously described (5, 10, 11).
|
Statistical methods. Clustering algorithms were adopted for grouping tumors with similar biological characteristics. A hierarchic agglomerative algorithm with Ward's generalized criterion and two nonhierarchical techniques, K-Means and K-Medoids algorithms were applied (13, 14).
Following Tibshirani et al. (15), four indices, which could be applied to both hierarchical and nonhierarchical algorithms, were used in order to determine the optimal number of groups. Namely, the CH, KL, H, and GAP indices were used according to Calinski and Harabasz (16), Krzanowski and Lai (17), Hartigan (18), and Tibshirani et al. (15), respectively. Each algorithm was used to create between 1 and 20 groups. The values of the four indices were then calculated for each subdivision. When an index suggested the same number of clusters for different algorithms, the
statistic (19) was used to assess agreement between the classifications produced by these different algorithms (i.e., to assess whether the clusters created by different algorithms contained the same tumors).
The frequency histograms of the biomarkers in each cluster were compared with the corresponding histogram in the whole sample. This was used to explore specific biomarker distributions across the clusters.
Multiple correspondence analysis (MCA) was used to better visualize the resulting cluster bioprofiles (20, 21). MCA can be applied both to categorical and continuous variables. For the latter, MCA has the advantage of implying neither linearity nor specific distributions. MCA allows us to visualize the association between markers and clusters on two-dimensional plots. The convenience of using two-dimensional plots is at the expense of the loss of a certain amount of information on the association patterns. To quantify the information retained in a given two-dimensional plot, the "fraction of explained information" was used (this corresponds to the percentage of the total variability that is accounted for by the two axes of the plot). The amount of information explained was calculated following Benzécri (22).
The five tumor markers under study (ER, PR, Ki-67, NEU, and p53) were used to generate the MCA plot (active information). The plot position of the categories of the active variables and the knowledge of which categories most contributed to the construction of the MCA plot were used to interpret the result obtained. Points close to each other in a plot correspond to associated marker categories and clusters. When points are close to the center of a spherical pattern, the variables are considered noncorrelated.
The number of metastatic lymph nodes, age, histology, pathologic stage, and the cluster classifications were only plotted on the existing MCA plane without modifying it (passive information), for a subsequent study of the relationships of the clinical and pathologic classifications with the biological characterization of the tumors. If a passive variable was not associated with the active variables used for the construction of the MCA plot, the categories of the passive variable should not be considered for the interpretation of the MCA results.
A test for the independence of passive variables from the active ones was used [Valeur test; ref. (21), page 123]. This test also permits us to evaluate the separation among the clusters based on the biological profile of the patients. For the sake of clarity, if the test statistic is >2, this should be considered significant at an approximately conventional 5% level. The variables Ki-67 and p53 were categorized for MCA and for univariate analysis with histograms, according to Table 1.
To evaluate the disease dynamics of patients identified by single marker values and cluster groups, event-free survival (EFS) probability was considered separately for node-negative cases without adjuvant therapy versus patients who received only hormone therapy. To this aim, EFS curves were estimated by resorting to the Kaplan-Meier method and, for single markers, compared with log-rank test. Considering the cluster groups because they were inferred on sample data, no formal statistical test was adopted to compare the estimated curves for each group, whereas relative risks at 5 years of follow-up were used to quantify their separation. Cluster and survival analyses were done with S-Plus;
statistic values were calculated with SAS V8; MCA was done with SPAD 3 (21).
| Results |
|---|
|
|
|---|
15%. Tumors were mainly pathologic T stage I (61.6%) and stage II (29.4%). Macroscopic axillary lymph node involvement was apparent in 46.1% of the cases, in good agreement with expectations.
Extensive work was done on the comparison between analysis on frozen and formaldehyde-fixed, paraffin-embedded sections (12).6 Excellent agreement between the two methods was shown, allowing an interchangeable use of the two technologies and a comprehensive analysis of the corresponding data. Immunostaining of all markers was quantified with a Computerized Image Analysis System. The percentage of positive-stained nuclei was calculated as the proportion of the positively stained area versus the total nuclear area. Measurements were the average of 25 randomly selected microscopic fields in each tumor section. We verified that at
15 optical fields, the measurements of the positive nuclear area reached a low SD and a stable coefficient of variation (ref. 12 and references therein). An additional threshold of at least 2,000 measured nuclei was applied to proliferation index estimates. This protocol allowed us to obtain reliable numerical values for average expression that were used for clustering purposes. Our strategy for expression measurements included thresholding for staining intensity (10, 11). This was a fixed value for each single marker across all sections. Pixel counting above threshold was subsequently done. Thus, the single measurement that was taken condensed information on expression levels and on the fraction of expressing cells in a single numerical value. The latter is in line of principle equivalent to the "total amount of a given antigen" in a tumor section. This was a distinct advantage for the chosen clustering strategy.
As a preliminary step, the prognostic effect of individual markers was evaluated by conventional univariate analysis on dichotomous ER, PR, Ki-67, HER2, and p53 values. ER was shown to be a significant predictor of response to hormone therapy in treated patients; Ki-67 proved to be a significant prognostic factor in untreated, node-negative patients (cutoff, 13%; P = 0.0398). HER2 was a significant prognostic factor both for untreated and for treated patients, resulting in a better prognosis at lower values of the marker. Lower values of p53 were associated with a better prognosis in p53-expressing untreated patients, although they did not reach conventional statistical significance (P = 0.183). The results obtained are, thus, very much in line with data from current literature.
Immunohistochemical data obtained as described were clustered using the three distinct clustering approaches. Notably, the CH index indicated three major clusters as the optimal solution for all three clustering algorithms. The
statistic showed high overall concordance between cases classified in each of the clusters generated by each clustering algorithm (Table 2).
|
|
|
|
|
ER and PR steroid receptors showed a similar pattern, as expected. On the other hand, low ER values seemed associated with the highest NEU, p53, and Ki-67 values, whereas absent PR was mainly associated with high NEU expression. The highest ER/PR values were isolated on the bottom left part of the graph (cluster 1; K-Means 1 and K-Medoids 1), whereas intermediate ER/PR values were reported on the top left quadrant (cluster 2; K-Means 2 and K-Medoids 2). Null p53 and low Ki-67 and NEU also seemed to be associated with intermediate ER/PR values. Clusters 1 and 2 were therefore associated with less aggressive tumor features. Cluster 3 (K-Means 3 and K-Medoids 3) was mainly associated with low ER, high p53, and intermediate to high NEU, whereas cluster 4 (K-Medoids only) seemed mainly associated with low PR and intermediate/high NEU, the most aggressive bioprofile of the clustered tumors.
The three clusters generated by K-Means followed a pattern similar to that of ER/PR values (low, intermediate, and high) from left to right of the graph. Cluster 2 was the nearest to the center of the MCA plot, thus showing less distinguishable characteristics with respect to the total sample. The four clusters created by K-Medoids showed an additional subdivision of the tumors according to HER2/NEU and p53 values. The "Valeur test" statistic corresponding to the K-Medoids classification are reported in Table 4.
|
Several categories of such variables project close to the origin of the axes and far from the identified clusters, consistent with a weak association with the bioprofiles. On the other hand, the number of metastatic lymph nodes tended to increase from left to right along the horizontal axis, according to the more aggressive features of tumors in cluster 3 (K-Means 3 and K-Medoids 3) and cluster 4 (K-Medoids 4) versus those in clusters 1 and 2 (K-Means 1 and 2 and K-Medoids 1 and 2). Similarly, young ages and high pT values seemed associated with clusters 3 and 4 (the behavior of pT3 was less interpretable, due to the small sample size). In an opposite pattern, Special and lobular histotypes were more frequent in clusters 1 and 2 versus clusters 3 and 4.
The strong overlap between the classifications obtained with K-Means and K-Medoids supported the reliability of the two shared clusters, and the likelihood of the split of cluster 2 by K-Medoids supported the identification of two additional clusters. Hence, the solution with four clusters created by K-Medoids was used as a basis to evaluate relevant, clinical outcomes [prognosis of node-negative patients without adjuvant therapy (263 patients, Fig. 5A) and response to hormone therapy (169 patients, Fig. 5B)].
|
As to the response to hormone therapy (Fig. 5B), the profile characterized by high ER/PR levels (cluster 1) had the best EFS at 5 years of follow-up (
80%), consistent with a good response to hormone therapy, whereas the profile characterized by high HER2/NEU levels (cluster 4) had the worst EFS (
25%). Of interest, this profile showed a markedly poorer response to hormonal treatment compared with cases in cluster 3, who showed even lower ER levels, together with higher proliferation index and p53 expression. The relative risks of relapse at 5 years of follow-up for clusters 2, 3, and 4 versus cluster 1 are listed in Table 5.
|
| Discussion |
|---|
|
|
|---|
Cluster analysis is a powerful multivariate technique that allows us to investigate whether subgroups with homogeneous features could be identified in a given sample of tumors. However, it should be treated with caution, as any clustering algorithm could lead to a trivial and/or fictitious grouping of tumors, if used without proper care. In the present work, three different clustering algorithms and four different indices were adopted to assess the feasibility of grouping profiles of a large, consecutive, single-institution series of breast cancers. Of interest, a minimum of three tumor clusters was proposed by all of the algorithms used, and comparison of K-Medoids and K-Means results supports the introduction of at least one additional cluster. This number is consistent with the results of recent molecular cluster studies (25, 26), and is at partial difference with earlier articles, which showed a tendency to identify only two profiles using hierarchical algorithms (5, 8, 9).
The results of our cluster analyses highlight well-known clinical/pathologic cancer profiles along tumor progression pathways. In particular, the K-Medoids 4-groups solution seems to correspond well to models of tumor progression that go from hormone-sensitive, minimal-change lesions (clusters 1 and 2) to more advanced tumors (cluster 3-4), characterized by higher proliferative rate and by more frequent oncogene/tumor suppressor alterations. Among other "classical" pathologic prognostic factors, age at diagnosis shows a trend from less aggressive lesions (clusters 1 and 2; patients mostly older than 55) to more aggressive ones (clusters 3 and 4; younger women). Special histotypes, associated with cluster 1, confirm their hormone sensitivity and overall low proliferative and oncogene expression rates. Lobular cancers often express lower PRs with respect to "special" types; accordingly, they are more represented in cluster 2. Finally, medullary carcinomas are centered close to cluster 3, confirming their high proliferative rate and down-regulation of hormone receptors.
Metastatic progression, in terms of number of metastatic lymph nodes, did not seem to strongly influence cluster profiles, although patients in clusters 3 and 4 tended to have a higher number of metastatic lymph nodes than patients in clusters 1 and 2. Explicit measurements of metastatic propensity of primary tumors were beyond the scope of this work. However, we note that the results obtained are consistent with models in which transformed cells possess a diffuse metastatic ability at early stages of tumor development (27). On the other hand, our findings do not support models in which metastatic development depends on the progressive accumulation of favorable mutation, that leads to the emergence of metastatic cells only at late stages of tumor development (28).
The MCA patterns highlighted above were in close consistency with those observed in previous analyses of independent case series (2931). In particular, a similar distribution of ER- and PR-expressing cases was observed for patients without axillary lymph node involvement, in spite of the use of biochemical measurement methods (30, 31) instead of immunohistochemistry. Moreover, low, intermediate and high levels of p53 were shown to be associated with intermediate, high, and low ER and PR, respectively (30). Consistent trends of other markers (Ki-67 and HER2/NEU) were also recorded, although in the context of less refined dichotomous classifications (29). Because cluster profiles are clearly associated with the above patterns in independent studies, such profiles are expected to be of widespread significance for breast cancer classification, both in terms of biological characteristics and of response to hormonal therapy.
Recent transcriptomic studies have been proposed to subdivide breast tumors into luminal and basal subtypes, according to their ER levels (rich and poor, respectively; refs. 1, 2). Our findings are consistent with this classification. However, in our hands, the ER-positive tumors were further subdivided in clusters 1 and 2 according to different PR levels (high for cluster 1 and medium/low for cluster 2). Of interest, heterogeneity among ER-positive tumors was also shown by genomic (32) and immunohistochemical (9) analyses, although on small samples of cases. It is noteworthy that the separation of clusters 1 and 2 in terms of ER/PR can be observed only if markers are evaluated as a continuum or at least in ordinal scales, not when they are considered as dichotomous variables (positive versus negative). Therefore, avoiding the use of cutoff values allows us to identify bioprofiles that would have been otherwise hidden. Given current clinical practices, this is of clear relevance for future studies and for potential applications in clinical settings.
EFS followed distinct trends in the four groups. In particular, groups 1 and 2 have better EFS than groups 3 and 4 for both treated and untreated patients. A direct comparison of the curves for patients with different therapies is not appropriate because therapy was assigned according to different clinical/biological features to begin with. However, it is worth noticing that group 3 showed the worst EFS among nontreated patients whereas group 4 had the worst performance among treated patients. These findings are consistent with a lack of response to tamoxifen for tumors with high expression of HER2/NEU (33). This result is particularly relevant as lack of sensitivity to hormonal treatment cannot be attributed to low ER values only (34), as clusters 3 and 4 showed equally low ER expression. The differential EFS of cluster 3 versus cluster 4 in treated versus untreated patients is in agreement with the results reported in ref. (35) where "c-ErbB-2 status defined a group of patients with a poor prognosis among those usually considered to have good prognosis, such as patients with low p53 values."
Of interest, Perou et al. (1) reported that ER-negative breast carcinomas encompass at least two biologically distinct tumor subtypes (basal-like and HER2/NEU-positive) "which may need to be treated as distinct diseases." The existence of a subgroup of basal-like tumors with HER2/NEU overexpression was confirmed by others (36, 37). Notably, the profiles of cluster 3 (with NEU distribution close to that of the total sample) and cluster 4 (with prevalent high NEU values) correspond to this distinction. However, most transcriptomic studies are limited by their small sample size and by the low signal/noise ratio of the measurements done (38). Thus, evidence for specific grouping and prognostic procedures coming from these studies should be treated with caution. Interestingly, three clusters were also found by immunohistochemical analysis of cytokeratin expression versus conventional markers on tissue microarrays (7). One of the clusters was characterized by HER2/NEU overexpression. The other two were distinguished by the expression of "basal" CK5/6 versus CK8/18, which were correlated with low and high ER levels, respectively. The HER2/NEU cluster was separated from that expressing CK5/6 and p53. These results remind of the separation between cluster 3 (p53 high) and cluster 4 (HER2/NEU high) in the present study. Furthermore, a recent tissue microarray study (25) proposed a three-cluster classification, where at least clusters 1 and 3 seemed to correspond with the homologous ones of the present study.
The four-cluster solution of this report does not imply that the underlying number of tumor subtypes is truly four. Indeed, one might expect that by increasing the number of investigated markers and/or the precision of their measurement, finer subdivisions would emerge. Cluster 2 in this work indeed seems more heterogeneous compared with the other three. Thus, we expect that this group of tumors might be split into more homogeneous subgroups with further investigation.
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: F. Ambrogi and E. Biganzoli contributed equally to this work.
Received 4/ 6/05; revised 8/23/05; accepted 9/14/05.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Bagnoli, F. Ambrogi, S. Pilotti, P. Alberti, A. Ditto, M. Barbareschi, E. Galligioni, E. Biganzoli, S. Canevari, and D. Mezzanzanica c-FLIPL expression defines two ovarian cancer patient subsets and is a prognostic factor of adverse outcome Endocr. Relat. Cancer, June 1, 2009; 16(2): 443 - 453. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |