Purpose: To compare clinical, immunohistochemical (IHC), and gene expression models of prognosis applicable to formalin-fixed, paraffin-embedded blocks in a large series of estrogen receptor (ER)–positive breast cancers from patients uniformly treated with adjuvant tamoxifen.
Experimental Design: Quantitative real-time reverse transcription-PCR (qRT-PCR) assays for 50 genes identifying intrinsic breast cancer subtypes were completed on 786 specimens linked to clinical (median follow-up, 11.7 years) and IHC [ER, progesterone receptor (PR), HER2, and Ki67] data. Performance of predefined intrinsic subtype and risk-of-relapse scores was assessed using multivariable Cox models and Kaplan-Meier analysis. Harrell's C-index was used to compare fixed models trained in independent data sets, including proliferation signatures.
Results: Despite clinical ER positivity, 10% of cases were assigned to nonluminal subtypes. qRT-PCR signatures for proliferation genes gave more prognostic information than clinical assays for hormone receptors or Ki67. In Cox models incorporating standard prognostic variables, hazard ratios for breast cancer disease-specific survival over the first 5 years of follow-up, relative to the most common luminal A subtype, are 1.99 [95% confidence interval (CI), 1.09-3.64] for luminal B, 3.65 (95% CI, 1.64-8.16) for HER2-enriched subtype, and 17.71 (95% CI, 1.71-183.33) for the basal-like subtype. For node-negative disease, PAM50 qRT-PCR–based risk assignment weighted for tumor size and proliferation identifies a group with >95% 10-year survival without chemotherapy. In node-positive disease, PAM50-based prognostic models were also superior.
Conclusion: The PAM50 gene expression test for intrinsic biological subtype can be applied to large series of formalin-fixed, paraffin-embedded breast cancers, and gives more prognostic information than clinical factors and IHC using standard cut points. Clin Cancer Res; 16(21); 5222–32. ©2010 AACR.
Molecular intrinsic subtyping reveals the major biological categories of breast cancer. Herein, we show adaptation of a 50-gene intrinsic subtyping signature for testing standard paraffin blocks. Using a large, homogeneously treated cohort of breast cancer patients, we directly compare gene expression results with high-quality clinical and central immunohistochemical data. We show the PAM50 approach to be superior as a prognostic test, specifically able to identify an ultralow-risk group who may not need chemotherapy. Based on these results, intrinsic subtyping tests are now being applied to randomized clinical trials series in Canada and the United States to assess predictive capacity (already under way for response to endocrine therapy, anthracyclines, and taxanes, with further studies under consideration). Should such studies prove a predictive value for intrinsic subtyping, this test could be clinically implemented in a similar form, as it has been designed for application on standard laboratory specimens.
Several gene expression technologies and statistical models have reported methodologies to identify breast cancer patients with estrogen receptor–positive (ER+), node-negative (N0) disease that may be adequately managed with 5 years of tamoxifen monotherapy (1–5). However, these studies often included patients with tumors already associated with established low-risk biomarkers, for example, low-grade histology, low Ki67-based proliferation index, and favorable surgical stage. It therefore remains controversial whether genomic assays should be applied routinely, or whether surgical stage and a limited number of immunohistochemical (IHC) markers will, in most cases, be adequate and less costly (6).
The clinical significance of continued efforts in this area is relevant for decisions about both chemotherapy and endocrine agents, as patients at low risk after 5 years of tamoxifen monotherapy could be spared the morbidity associated with extended aromatase inhibitor therapy (7). Studies that address this issue are few because extremely long follow-up and information on breast cancer–specific mortality are required. Furthermore, because frozen tumor archives are unavailable from suitably large patient populations, gene expression technologies must be applicable to degraded RNA extracted from formalin-fixed, paraffin-embedded tissues that are necessarily more than a decade old.
Our group has assembled and published several technological and statistical approaches to address prognosis in ER+ breast cancer. We therefore sought to compare clinicopathologic, IHC, and molecular methodologies in a single independent test set to identify the best approach. Importantly, we focused on fixed statistical models that were previously trained on independent data sets to avoid overoptimistic results. The models we report in this article include the use of standard pathologic factors, such as centrally reviewed histologic grade, as incorporated into Adjuvant! Online (8), models based on IHC for biomarkers of intrinsic subtypes (6), and a gene expression assay using 50 genes (PAM50). The latter represents a reduced gene set, amenable to assay by techniques such as quantitative real-time reverse transcription-PCR (qRT-PCR), which accurately identifies the major intrinsic biological subtypes of breast cancer and generates risk-of-relapse (ROR) scores (9). The investigation used a large independent cohort of formalin-fixed, paraffin-embedded pathology specimens from patients with ER+ breast cancer, all M0 but otherwise representing a spectrum of T and N stages including a large fraction of node-positive (N+) patients. All patients received adequate local treatment, 5 years of tamoxifen therapy but no adjuvant chemotherapy, and were followed for relapse-free survival (RFS) and disease-specific survival (DSS) for over a decade.
Materials and Methods
Patient and sample characteristics
The study cohort was accrued from female patients with invasive breast cancer, diagnosed in British Columbia between 1986 and 1992. Cancer tissue from these patients had been frozen and shipped to Vancouver Hospital for central ER and progesterone receptor (PR) testing by dextran-charcoal–coated (DCC) ligand-binding assay. The PAM50 assay was conducted on the portion of this tissue that was formalin fixed and paraffin embedded for histologic correlation. Characteristics of this cohort have been previously described (6), and the same source blocks were used to assemble tissue microarrays for previously published studies on ER (10), HER2 (11), PR (12), Ki67, cytokeratin 5/6, and epidermal growth factor receptor (6, 13). Quantitative ER was determined using the Ariol automated digital imaging system (14), and the same method was applied for PR. For this study, we selected samples from patients with ER+ tumors by IHC who had received tamoxifen as their only adjuvant systemic therapy. Provincial guidelines from that time period recommended tamoxifen for women >50 years of age, whose ER status was positive or unknown, and who were either node positive or had lymphovascular invasion. Cohort identification and sample selection for this study are summarized as per REMARK criteria (15) in Supplementary Table S1.
RNA preparation, qRT-PCR, and assignment of biological subtype and ROR score
H&E sections from each block were reviewed by a pathologist (T.O.N.). Areas containing representative invasive breast carcinoma were selected and circled on the source block. Using a 1.0-mm punch needle, at least two tumor cores were extracted from the circled area. Details of RNA preparation from paraffin cores, the qRT-PCR assay for the PAM50 panel and reference genes, and how these results allow assignment into luminal A, luminal B, HER2-enriched, and basal-like subtypes, and the independently trained ROR-S (ROR based on subtype), ROR-T (ROR based on tumor size weighted model), ROR-P (ROR based on proliferation weighted model), and ROR-PT (ROR based on proliferation and tumor size weighted) risk score assignments are presented in Supplementary Materials and Methods. For clarity, the term ROR-T is now used for the same model described in our earlier publication as ROR-C (“clinical”; ref. 9).
Relation of clinicopathologic factors, intrinsic subtypes, and ROR scores to clinical outcome
Statistical analyses were conducted using SPSS v16.0 and R v2.8.0. Univariable analyses of tumor subtype against breast cancer RFS and DSS were done by Kaplan-Meier analysis with log-rank test. Multivariable analyses were done against the standard clinical parameters of tumor size, nodal status, histologic grade, patient age, and HER2 status. HER2 scores were centrally determined based on assay of adjacent cores from the same source blocks, assembled into tissue microarrays, and subjected to IHC and fluorescence in situ hybridization (FISH) analysis using clinical-equivalent protocols (11). Cox regression models (16) were built to estimate the adjusted hazard ratios of the qRT-PCR–assigned breast cancer subtypes, as well as ROR scores categorized by published cut points and as a continuous variable. IHC-based subtypes were assigned as previously defined (6). The online decision-making tool Adjuvant! Online (http://www.adjuvantonline.com), previously validated on the British Columbia population cohort (8), was used to generate breast cancer RFS and DSS estimates for each patient in this cohort. Only cases with information for all the covariates were included in the analyses. Smoothed plots of weighted Schoenfeld residuals were used to assess proportional hazard assumptions (17), and time stratifications were used where hazards were not proportional over the entire follow-up period.
The C-index (concordance index; ref. 18) is defined as the probability that risk assignments to members of a random pair are accurately ranked according to their prognosis. The number of concordant pairs (order of failure and risk assignment agree), discordant pairs (order of failure and risk assignment disagree), and uninformative pairs are tabulated to calculate the measure. C-index values of 0.5 indicate random prediction, and higher values indicate increasing prediction accuracy. Variability in the C-index for each predictor and P values from comparisons were estimated from 1,000 bootstrap samples of the risk assignments. Calculation was done using the rcorr.cens function implemented in the Hmisc (19) library for R statistical software version 2.8.1 (http://www.R-project.org).
Intrinsic subtyping of ER+, tamoxifen-treated breast cancer using the PAM50 assay
RNA was extracted from pathologist-guided tissue cores from 991 formalin-fixed, paraffin-embedded breast cancer specimens. Eight hundred and eleven samples yielded sufficient RNA for analysis (at least 1.2 μg total RNA at a concentration of ≥25 ng/μL). Template was technically sufficient in 786 cases, based on all internal housekeeper gene controls being expressed in the sample above background. Clinical characteristics for the patients included in the PAM50 analysis are presented in Table 1 (Supplementary Tables S2 and S3 provide details stratified by node status). Based on the nearest PAM50 centroid algorithm, intrinsic breast cancer subtypes were assigned using gene expression as follows: 372 samples (47.3%) were luminal A, 329 (41.9%) luminal B, 64 (8.1%) HER2 enriched, 5 (0.6%) basal-like, and 16 (2.0%) normal-like. Thus, although all cases in this study were positive for ER by centrally assessed IHC analysis on a tissue microarray (10), and 98.8% were also positive by DCC biochemical assay (Table 1), the gene expression panel nevertheless assigned 9% of cases into nonluminal subtypes, mostly HER2 enriched. This phenomenon has been previously observed when interrogating published data sets for expression of the PAM50 genes (9). For the 16 cases assigned as normal-like, histology was reviewed from adjacent tissue cores, and in 14 of 16 cases, invasive cancer cells were absent or rare. Normal-like cases were therefore excluded from outcome analyses, as a breast cancer subtype could not be confidently assigned due to insufficient tumor content.
The intrinsic biological subtypes were strongly prognostic by Kaplan-Meier analysis (Fig. 1A and B). In the British Columbia population at the time these samples were originally acquired, many patients with a clinically low-risk profile received no adjuvant systemic therapy (8). In contrast, those receiving adjuvant tamoxifen (the subjects of this study) had tumors that were mostly node positive and high grade, exhibited lymphovascular invasion, and therefore constitute a higher-risk group with overall 10-year RFS of 62% and DSS of 72%. Those assigned by the PAM50 assay to luminal A status had a significantly better outcome (10-year RFS, 74%; DSS, 83%) than luminal B, HER2-enriched, or basal-like tumors (Fig. 1A for RFS and Fig. 1B for DSS). The ROR algorithms (9) were originally trained on microarray data from N0 patients who received no adjuvant systemic therapy, and have not previously been applied to a population homogeneously treated with adjuvant tamoxifen, nor to a series containing large numbers of N+ cases, nor to the endpoint of DSS. In this data set, ROR-S (a model based solely on gene expression) nevertheless showed performance consistent with our previous report (Fig. 1C and D). Multivariable Cox models were constructed to test the independent value of PAM50 subtyping against standard clinical and pathologic factors including age, histologic grade, lymphovascular invasion, HER2 expression, nodal status, and tumor size. To meet proportional hazard assumptions, multivariable models were assessed with the time axis split at 5 years (20), as HER2-enriched and basal-like tumors (Fig. 1A and B) and ROR-S high category tumors (Fig. 1C and D) had a much higher event rate in the first 5 years than subsequently. The intrinsic biological subtype and ROR-S remained significant in the multivariable models for DSS (Table 2) and RFS (Supplementary Table S4), particularly in the first 5 years, as did pathologic staging variables (tumor size and node status). However, histologic grade, lymphovascular invasion, and clinical HER2 status, significant in univariable analysis in this cohort, no longer contributed significant independent prognostic information when the multivariable analysis included the PAM50 assignments.
Comparisons between gene expression and clinical assays for hormone receptors and proliferation
In a case that is ER+ by IHC, additional information about hormone receptor expression can be obtained in several ways, including DCC ligand-binding assay, quantitative IHC for ER, or equivalent measures of PR. Most published assays for breast cancer prognosis in ER+ disease include tumor growth rate as one of the parameters in the statistical model, and this data set was previously assessed in detail for IHC Ki67 index (6). The PAM50 qRT-PCR data allow detailed quantitative assessment of the functionality of the estrogen response pathway (8-gene luminal signature) as well as a proliferation signature based on the mean expression of 11 genes linked to cell cycle progression (trained on published data, as per Supplementary Materials and Methods). The availability of all these measurements (10) provides an opportunity to determine which approach most accurately captures the prognostic effect of estrogen pathway biomarkers and tumor growth rate in a direct comparison (Fig. 2). Given a randomly selected pair of subjects, C-index is the probability that the patient assigned the more extreme risk score actually has a worse prognosis. A value of 0.5 indicates discrimination that is no better than chance prediction, and a value of 1 indicates perfect discrimination of samples. Using the C-index to compare prognostic capacity in this uniformly tamoxifen-treated cohort, the combination of luminal genes measured by the PAM50 yields more prognostic information than other methods of hormone receptor analysis, but the differences are not significant. Although Ki67 index by IHC seems to outdo quantitative ER, the proliferation signature provides the most robust approach for the prediction of both RFS and DSS (Fig. 2; Supplementary Table S5). Multivariable analysis indicated that the Ki67 IHC assay did not contribute significant independent information to prognostic models for either N0 or N+ breast cancer patients when information on the proliferation signature is included (Supplementary Table S6).
Comparison of fixed models of prognosis in N0 breast cancer
For formal model comparisons, data were generated on four fixed approaches, without any element of training within the test set: (a) clinical model based on Adjuvant! Online, (b) IHC-based (incorporating data on Ki67 and HER2), (c) the ROR-S approach based on PAM50 gene expression alone, and (d) the proliferation signature alone and as incorporated into the ROR-P risk model using a β coefficient weighting for proliferation (described in Supplementary Materials and Methods). Adjuvant! Online incorporates full tumor size staging information; to account for the influence of tumor size, the biomarker models were also weighted by a β coefficient (T) that incorporated the prognostic information associated with T1 status versus higher T stage (the level of detail available in the independent training sets). This approach created IHC-T, ROR-T, and ROR-PT models. In addition, the strong independent influence of N stage was accounted for by conducting the analysis separately in the N0 and N+ populations. C-index assessments showed superiority of the biomarker models over the purely clinical Adjuvant! Online model in the N0 population, with the ROR-PT approach providing the most prognostic information (Fig. 3A). In multivariable analysis, the addition of ROR-P to a model of ROR-S results in a significant increase in explained prognostic variation (RFS, P = 0.0032; DFS, P = 0.0015); ROR-PT is also significant after conditioning on ROR-S (RFS, P = 0.0023; DFS, P = 0.0015) but not ROR-P (RFS, P = 0.12; DFS, P = 0.13). A continuous score based on ROR-PT was generated to translate the data into an individual RFS and DSS risk assessment tool (Fig. 3B). Kaplan-Meier analysis illustrates the ability of the ROR-PT model to identify patients who have an extremely high chance (>95%) of remaining disease-free (Fig. 3C) and alive beyond 10 years (Fig. 3D). In contrast, our previously published IHC model (6) could not identify a group with sufficiently favorable outcomes that 5 years of tamoxifen might be considered adequate treatment (i.e., <90% 10-year RFS; Fig. 3E and F).
Comparison of fixed models of prognosis in N+ breast cancer
For N+ disease, C-index analysis (Fig. 4A) supports the conclusion that the ROR-T score produces the best prognostic model; in contrast to N0 disease, the proliferation signature added relatively little information and proliferation weighting (ROR-PT) did not yield a superior model. Adjuvant! Online performed almost as well, but had the advantage of incorporating the actual number of involved lymph nodes. This information was not available in the independent training sets used to build the ROR models, and so could not be used in the current analysis (which can, however, serve to train future models incorporating number of involved lymph nodes). The continuous score model for N+ disease (Fig. 4B) produces a very broad range of prognosis, similar to N0 disease, although few patients have a prognosis in the range where tamoxifen monotherapy for 5 years would be considered sufficient treatment. Although there were large and highly significant differences in survival in ROR-defined risk groups, Kaplan-Meier analysis (Fig. 4C and D) illustrates that even patients in the lowest risk ROR group are still subject to relapses and late deaths from breast cancer, particularly after the 5th year of follow-up. The IHC-based risk model incorporating Ki67 and HER2 also produces a statistically significant prognostic effect for RFS (Fig. 4E) and DSS (Fig. 4F), although these differences are narrower than those achieved by the gene expression–based model.
Previous studies have established that intrinsic biological signatures are present and have prognostic significance in breast cancer cohorts from multiple different institutions, profiled with several gene expression microarray platforms (21–24). To identify these subtypes on standard formalin-fixed, paraffin-embedded pathology specimens, we developed a qRT-PCR test based on a panel of 50 genes (9). The analysis reported here applied this test to a series of paraffin blocks with >15-year detailed follow-up.
Whereas previously assessed cohorts consisted mainly of low-risk women receiving no adjuvant systemic therapy, or were heterogeneously treated, the cases in the current study are all women with ER+ breast cancer who received endocrine therapy as their sole adjuvant treatment, a group of particular clinical importance and contemporary relevance. In this analysis, we sought to compare different technologies for predicting long-term outcomes for such patients. In this study cohort, patients were diagnosed with N+ or higher-risk N0 disease. Only 8% of the N0 population had grade 1 disease and 55% exhibited lymphovascular invasion (Table S2). Under the current standard of care in most countries, the majority of these patients would now be treated with adjuvant chemotherapy (25) and extended endocrine therapy. Using a series of fixed models trained in independent data sets, we compared a standard approach using clinicopathologic information (Adjuvant! Online) with our published luminal B discriminator based on Ki67 and HER2 IHC additionally weighted for T stage (IHC-T), and with PAM50 gene expression–based ROR models weighted for T stage (ROR-T and ROR-PT). In N0 patients, the ROR-PT approach was the most accurate and was able to identify patients in whom 5 years of tamoxifen may be adequate treatment based on the very low late relapse rate in the 5- to 10-year window (Fig. 3C). In N+ disease, the PAM50 approach represents an advance in prognostication, but late relapses and deaths were seen even in the lowest risk group identified using the best ROR model. Unlike in N0 disease, proliferation signature weighting did not improve the C-index in N+ disease.
On this cohort, detailed centrally determined IHC analyses have previously been done and published (6, 10–13, 26). C-index, Kaplan-Meier, and Cox model analyses show that IHC approaches do work and provide significant prognostic information. However, the PAM50-based models are superior in terms of adding significant additional information and in their capacity to identify a particularly low-risk group of women.
We view these PAM50 models, derived from archival formalin-fixed RNA, as a potential replacement for grade-, hormone receptor–, Ki67-, and HER2-based prognostic models, but not as a replacement for pathologic stage (as tumor size and nodal status remain independent predictors in multivariable models that include PAM50-based prognostic information). One weakness of our approach is that our current accounting for pathologic stage is oversimplified due to the limited stage distributions and clinical information in our training sets. We analyzed the data as either N0 or N+, and accounted for T stage by categorizing the samples as either T1 or greater. A future aim is to integrate the PAM50 data into the Adjuvant! Online approach (27) to more completely account for the prognostic influence of pathologic stage. To achieve this, we would need to construct a training set that adequately includes all the five categories of T size and four categories of N stage used in Adjuvant! Online to gauge the prognostic weight of these pathologic stage categories in the setting of PAM50 information. Additionally, incorporation of all IHC data as continuous variables in a combined model may improve its prognostic value. The current series contains sufficiently detailed clinical and IHC information to contribute to such detailed comparisons as a training set requiring further validation.
An additional caveat to our study is that the population was strongly biased toward higher-risk breast cancers and so likely underestimates of the number of patients in the broader, N0 population for whom adjuvant tamoxifen would represent adequate treatment. The current generation of adjuvant aromatase inhibitor trials would be an appropriate setting to address the value of our approach further. We accept the possibility that a better model using Ki67 at a different cut point could be developed. However, because we were focused on comparing fixed models, we used our published approach. Further work on the Ki67 model and cut-point optimization will require independent data sets.
In comparison with other signatures such as the recurrence score and genomic grade index (1, 28, 29), the PAM50 has the potential advantage of discriminating high-risk patients into luminal B, HER2-enriched, and basal-like subtypes, who are likely to respond differently to the main systemic therapy options (endocrine, anti-HER2, and anthracycline versus nonanthracycline versus taxane chemotherapy regimens). The assay requires neither frozen tissue (30) nor manual microdissection of cut sections (1), and can be readily applied to standard paraffin blocks including archival tissues from clinical trials. Currently available assays such as Mammaprint (31) and Oncotype DX (32) were optimized to recognize particularly low-risk patients from among a N0 early-stage population who did not receive chemotherapy. Because intrinsic subtyping is designed to identify discriminative biological features of breast cancer, rather than being derived around clinical outcome in a specific population, this approach is particularly likely to extrapolate well onto other patient cohorts (33). The current study shows the ability of PAM50 to recognize a very low-risk prognostic group among women receiving tamoxifen and no chemotherapy, similar to the Oncotype DX assay (34, 35). A direct comparison of different expression profile approaches may become possible in the future through a reanalysis of cohorts with the PAM50 that have already been analyzed by Oncotype DX, because both assays can be applied to the same source material.
Our inability to identify a group of patients with N+ disease in whom 5 years of tamoxifen is adequate is reminiscent of the recent findings from the Southwest Oncology Group, who also found that a molecular signature for good outcome in N0 disease failed in N+ disease in this regard (35). It would be relevant to study a series of patients treated with extended adjuvant aromatase inhibitor therapy, who will have even lower residual risk, as some of the patients in the low-risk N+ group may simply require longer treatment with modern endocrine therapy rather than chemotherapy. The development of new approaches for defining prognosis in N+ disease is also warranted. We have already established the preoperative endocrine prognostic index, which showed that the “on endocrine treatment” Ki67 value is more effective than baseline Ki67 for the identification of patients with clinical stage II and III disease who have excellent long-term outcomes after neoadjuvant endocrine therapy (36). A comparison between Ki67 and the PAM50-based proliferation signature in the neoadjuvant endocrine therapy setting is therefore one logical next step. The applicability of this test to formalin-fixed, paraffin-embedded tissues will make possible its use on large clinical trial archives that address this issue (37). The results of our study highlight the feasibility of measuring multigene expression panels on such series as a means for showing clinical utility using a method readily applicable to prospective clinical samples that provides more prognostic information than clinical or standard IHC approaches.
Disclosure of Potential Conflicts of Interest
T.O. Nielsen, C.M. Perou, M.J. Ellis, P.S. Bernard: ownership interest, Bioclassifier LLC; U.S. Patent No. 61/057,508.
We thank current and former members of the British Columbia Cancer Agency's Breast Cancer Outcomes Unit, including S. Chia, K. Gelmon, H. Kennecke, I. Olivotto, and C. Speers, for maintaining the clinical database.
Grant Support: T. Nielsen is a Senior Scholar of the Michael Smith Foundation for Health Research. Grant support was provided by National Cancer Institute (NCI) Strategic Partnering to Evaluate Cancer Signatures grant U01 CA114722-01, Canadian Cancer Society, Huntsman Cancer Institute/Foundation (P.S. Bernard), ARUP Institute for Clinical and Experimental Pathology (P.S. Bernard), NCI Breast Specialized Program of Research Excellence grant P50-CA58223-09A1 (C.M. Perou), St. Louis Affiliate of the Susan G. Komen Foundation CRAFT (M.J. Ellis), Breast Cancer Research Foundation (C.M. Perou and M.J. Ellis), and Sanofi-Aventis Canada unrestricted educational grant. Additional support provided by the TRAC facility and Informatics at the Huntsman Cancer Center, supported in part by NCI Cancer Center Support grant P30 CA42014-19, and the tissue procurement facility at the Alvin J. Siteman Cancer Center at Washington University School of Medicine, which is funded in part by the NCI Cancer Center Support grant P30 CA91842.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
- Received May 13, 2010.
- Revision received July 29, 2010.
- Accepted August 25, 2010.