Abstract
Background: Inflammatory breast cancer (IBC) is a poorly characterized form of breast cancer. So far, the results of expression profiling in IBC are inconclusive due to various reasons including limited sample size. Here, we present the integration of three Affymetrix expression datasets collected through the World IBC Consortium allowing us to interrogate the molecular profile of IBC using the largest series of IBC samples ever reported.
Experimental Design: Affymetrix profiles (HGU133-series) from 137 patients with IBC and 252 patients with non-IBC (nIBC) were analyzed using unsupervised and supervised techniques. Samples were classified according to the molecular subtypes using the PAM50-algorithm. Regression models were used to delineate IBC-specific and molecular subtype-independent changes in gene expression, pathway, and transcription factor activation.
Results: Four robust IBC-sample clusters were identified, associated with the different molecular subtypes (P < 0.001), all of which were identified in IBC with a similar prevalence as in nIBC, except for the luminal A subtype (19% vs. 42%; P < 0.001) and the HER2-enriched subtype (22% vs. 9%; P < 0.001). Supervised analysis identified and validated an IBC-specific, molecular subtype-independent 79-gene signature, which held independent prognostic value in a series of 871 nIBCs. Functional analysis revealed attenuated TGF-β signaling in IBC.
Conclusion: We show that IBC is transcriptionally heterogeneous and that all molecular subtypes described in nIBC are detectable in IBC, albeit with a different frequency. The molecular profile of IBC, bearing molecular traits of aggressive breast tumor biology, shows attenuation of TGF-β signaling, potentially explaining the metastatic potential of IBC tumor cells in an unexpected manner. Clin Cancer Res; 19(17); 4685–96. ©2013 AACR.
Translational Relevance
The clinical relevance of the research described in this article is manifold. First, due to the extent of our series, which includes by far the largest series of inflammatory breast cancer (IBC) samples ever reported, we were able to delineate the molecular profile of IBC with enhanced accuracy, taking into account various points of criticism raised with respect to earlier studies. Our report is the first to analyze the expression profiles of IBC samples in light of a uniform case definition for IBC put forward by an international expert panel. Also, the present study is the first to deal with the fact that the differential distribution pattern of the molecular subtypes between IBC and non-IBC (nIBC) affects its outcome. Furthermore, our data suggest that differentially expressed genes between IBC and nIBC in a molecular subtype-independent manner hold the fingerprints of aggressive breast tumor behavior. Therefore, our expression series not only suits IBC research, but might also be instrumental for scientists focused on breast cancer invasion and metastasis in general. Finally, this study gives voice to the World IBC Consortium, which is founded to foster collaborations between researchers active within the IBC community. We believe that our effort is an example of how collaborations with a strong international backbone can prosper research well into the 21st century.
Introduction
Inflammatory breast cancer (IBC) is an aggressive form of locally advanced breast cancer. At the time of diagnosis, virtually all patients have lymph node metastases and one third of the patients have metastases in distant organs. Consequently, patient prognosis is poor (1). The rate of pathologic complete response (pCR) after primary anthracycline-based chemotherapy ranges from 15% to 30% and the 5-year survival remains around 40%. Currently, IBC remains a poorly characterized disease lacking specific molecular targets for therapy, although ErbB2 is often amplified.
The low frequency of occurrence (approximately 5% of all breast tumors) combined with the small size of diagnostic samples, in addition to the fact that patients with IBC are treated with neo-adjuvant chemotherapy, are the factors that have hampered past molecular studies, specifically those that were directed at deciphering the molecular biology of this disease by genome-wide approaches (2). The major stumbling block has been the poor statistical power when multiple genes (typically a multitude of 10,000) are tested on a small number of samples. High genes/samples ratios, even after removing noisy and noninformative data, result in dramatic downscaling of the significance level (i.e., multiple testing correction) leading to loss of statistical power (3).
Despite the issues outlined earlier, several studies aimed at defining the molecular signature of IBC using genome-wide gene expression profiling. Most of these studies have shown that a molecular signature of IBC is definable (4–10). Nevertheless, when comparing the crude gene lists, only a limited number of genes were commonly identified across all studies (7, 9). Also, when translating the gene lists into biologic processes or signal transduction pathways, there was significant ambiguity. Potential reasons for this observation have been discussed previously and include: (i) distinct case definitions of IBC used across different studies, (ii) variability of the non-IBC (nIBC) control group, (iii) differences in the characteristics of the IBC and nIBC groups between different studies, notably with respect to the hormone receptor status, and (iv) interstudy technologic differences, that is, use of different platforms and subsequently different input gene lists (2).
In 2008, the World IBC Consortium (www.ibcconsortium.org) was founded with the goal of fostering collaborations between international research groups who focus on this rare but aggressive form of breast cancer. The research described in the current article is the first project spearheaded by the World IBC Consortium and aims at redefining the molecular profile of IBC by taking into account the points of criticism raised with respect to the earlier studies described earlier. Here, we used a uniform case definition of IBC that has been put forward by an international expert panel (11). Using gene expression profiles from Affymetrix (HGU133-series) derived from 3 different sites involved in IBC research, we were able to obtain an unprecedented number of IBC samples (N = 137), which allows us to resolve both the sample size-related and the platform-related issues identified in the previous studies.
Materials and Methods
Patients and samples
Tumor samples were obtained from patients with breast adenocarcinoma treated in our 3 institutions: the Institut Paoli-Calmettes (IPC, Marseille, France; 71 IBC and 139 nIBC), the MD Anderson Cancer Center (MDA, Houston, TX; 25 IBC and 58 nIBC), and the General Hospital Sint-Augustinus (TCRU, Antwerp, Belgium; 41 IBC and 55 nIBC). Each patient gave written informed consent and this study was approved by the Institutional Review Boards of all 3 participating centers. The present dataset includes 137 samples from patients with IBC and 252 samples from patients with nIBC. Patients with IBC were selected by strictly adhering to the consensus diagnostic criteria published by Dawood and colleagues (11). IBC cases included locally advanced [American Joint Committee on Cancer (AJCC) stage III] and metastatic cases (AJCC stage IV). nIBC cases included both early-stage disease (AJCC stages I and II) and advanced stage disease (locally advanced, AJCC stage III; and metastatic, AJCC stage IV). Estrogen receptor (ER) and ErbB2 expression were defined using probe sets 205225_at (ESR1) and 216836_s_at (ERBB2), as described previously (12). As a control, we verified the correlation between mRNA status and immunohistochemistry (IHC)-based protein status as discrete variables. The positive or negative mRNA status was defined using a 2-component Gaussian mixture distribution model as described previously (13). Data are provided in the Supplementary Fig. S1 and Supplementary Table S1. Tumor grade was determined using the genomic grade index (GGI; ref. 14). All patients were treated using a multidisciplinary approach according to standard guidelines. Clinicopathologic data of our series are shown in Table 1 and are consistent with literature.
Histoclinical data of IBC and nIBC samples
Data processing and normalization
RNA extraction from the 389 samples and hybridization onto Affymetrix GeneChips (HGU133-series) was conducted as described before (6, 8, 9, 15). Expression data were normalized by guanine cytosine robust multi-array analysis (16). For each of the 3 datasets separately, probe sets with expression values above log2(100) in at least 1% of the arrays were filtered in. Next, the list of common informative probe sets (N = 9,926) was identified. This list was used to merge the distinct datasets. Therefore, we conducted regression normalization, using the Limma-package in BioConductor, to remove technical, laboratory-specific, variation in gene expression between the distinct datasets. A principal component analysis (PCA) was done on the merged dataset prior and after the regression normalization to verify the accuracy of the regression normalization in removing the laboratory-specific variation in gene expression.
Molecular subtypes
We classified each sample in the merged dataset according to the molecular subtypes (luminal A, luminal B, basal-like, HER2-enriched, and normal-like) using the PAM50-methodology described by Parker and colleagues (17). In addition, samples predicted to belong to the claudin-low subtype by the 9-cell line claudin-low predictor (18) were considered claudin-low. To further investigate the effect of the regression normalization on the biologic variation in gene expression, the molecular subtype classification was compared with the expression of an ER activation signature (19).
Unsupervised analysis
To investigate common themes in gene expression within each tumor phenotype (IBC or nIBC) separately as well as for both tumor phenotypes combined, we adopted the following strategy. We randomly selected 100 samples from the 3 groups of interest (group 1, IBC; group 2, nIBC; and group 3, combined IBC/nIBC). On this selection, unsupervised hierarchical cluster analysis (UHCA) using Manhattan distance as distance measure and Ward linkage as the dendrogram drawing method was conducted only for those genes having a SD greater than 2. UHCA was followed by a cluster robustness analysis using the silhouette algorithm. The silhouette algorithm divides the dendrogram into an increasing number of clusters. We chose to analyze a maximum number of 20 clusters. For each increase in the number of clusters, the algorithm generates a score, the magnitude of which is proportional to the robustness of the identified clusters. This analysis was repeated 10 times, resulting in a series of 10 silhouette scores for each number of clusters. The number of clusters in each group of interest was then determined by sequentially comparing the silhouette scores (silhouette scores for 2 clusters vs. silhouette scores for 3 clusters, silhouette scores for 3 clusters vs. silhouette scores for 4 clusters, and so on) and equals the highest number of clusters for which the silhouette score significantly drops when increasing the number of clusters with one unit.
Supervised analyses
Supervised analysis, identifying differentially expressed genes between IBC and nIBC, was conducted to evaluate the influence of alternative stage-matching approaches on the identifiable biologic differences between IBC and nIBC. Therefore, global differences in gene expression between IBC and 3 alternatively composed nIBC control groups (stage I–IV, stage I–II, and stage III–IV) were investigated using the global test (20). In addition, for each comparison, we sought to identify lists of differentially expressed genes using significance analysis of microarrays (SAM; ref. 21). The resulting gene lists were compared.
To further explore functional differences between IBC and nIBC, we repeated supervised analyses at the pathway level. We applied 18 pathway gene expression signatures originally described by Gatza and colleagues using the procedures outlined in the original publication (19) and 1 additional gene expression signature for VEGF-activation (22). In addition, for each of the previously defined comparisons, we explored differences in patterns of transcription factor activation. For that purpose, we designed an algorithm to quantify the extent of transcription factor activation in samples based on the gene expression data. Details are provided in the Supplementary Data and Supplementary Table S2.
Molecular subtype-independent supervised analysis
Because of the association of the tumor phenotype (IBC/nIBC) with the molecular subtypes, the list of probe sets resulting from the IBC/nIBC comparison probably contains many genes related to molecular subtype, and therefore we hypothesize that this gene list is not IBC-specific. To dissect IBC-specific variations in gene expression from molecular subtype-specific variations in gene expression, supervised analysis (IBC vs. nIBC) was repeated by applying linear regression modeling (Limma-package). Before analysis, we divided the dataset into a training set of 250 samples (84 IBC samples and 166 nIBC samples) and an independent validation set of 139 samples (53 IBC samples and 86 nIBC samples). The distributions of the molecular subtypes, the tumor stage, and the institution from which the samples were derived (IPC, MDA, and TCRU) were compared between both datasets. The training dataset was analyzed using linear regression models incorporating the molecular subtype classification and the IBC/nIBC classification to identify probe sets with IBC-specific expression components. Using the global test (20), lists of probe sets were analyzed for global expression differences between IBC and nIBC in the validation set. Classifier models based on the identified lists of probe sets with an IBC-specific expression component were constructed using the nearest shrunken centroid algorithm implemented in the PAMR-package. For this purpose, 10-fold cross-validation on the training set was conducted to select an appropriate δ-value minimizing the cross-validated training error rate. The resulting models were applied on the validation set to estimate accuracy, sensitivity, and specificity. In addition, we tested the performance of the classifier models to discriminate between samples from patients with and without IBC in more homogenous subgroups. For the sake of statistical power, we combined luminal A and B samples into a luminal group and basal-like and claudin-low samples into a triple-negative breast cancer (TNBC) group. Finally, the strategy of dissecting IBC-specific effects from molecular subtype-specific effects using linear regression models was repeated on the entire dataset for the pathway and transcription factor activation data.
Analysis of publicly available nIBC data
Given the higher metastatic risk of IBC compared with nIBC, we tested the hypothesis that our 79-gene IBC/nIBC signature, if biologically relevant, is associated with metastatic relapse in nIBC. Therefore, we collected expression data from 6 publicly available series of adjuvant chemotherapy- and hormone therapy-naïve lymph node-negative patients with early breast cancer. These series have been used for similar purposes in earlier studies and follow-up data in terms of metastatic relapse are available. An overview of included datasets is provided in Supplementary Table S3. Redundancy related to patient samples included in multiple datasets was eliminated, yielding 871 samples available for analysis. Before analysis, we mapped hybridization probes for the differentially expressed genes across the 2 oligonucleotide-based platforms used across the series. When multiple probes were mapped to the same GeneID (EntrezGene identification number), the one with the highest variance in a particular dataset was selected to represent the GeneID. Analysis of each dataset was done separately to guarantee a larger number of genes common with our gene list. After having identified the common genes, we used distance weighted discrimination (DWD; ref. 23) to normalize each public dataset to be comparable with our present pooled Affymetrix dataset. The samples were classified as “IBC-like” or “nIBC-like” using a support vector machine (SVM) function based on the list of 79 differentially expressed genes. The initial outcome of interest for this SVM classifier was the separation of IBC from nIBC. The SVM model was established in our original training set (84 IBC and 166 nIBC samples) with a linear kernel as the parameter.
Statistical analyses
In the results section, only nominal P values are reported; however, in case of multiple comparisons, P values were corrected and were considered significant only if the false discovery rate (FDR) was smaller than 0.05 (9). To compare the distribution according to categorical variables, χ2 testing or the Fisher exact test were used when appropriate. To compare continuous data between 2 or more groups, respectively the Mann–Whitney U test and the Kruskal–Wallis test were used. To compare continuous data with 1 reference value, the Wilcoxon signed rank test was used. To compare distributions of continuous variables, Spearman correlation analysis was conducted. The distant metastasis-free survival (DMFS) for patients with nIBC included in the public dataset was calculated from the date of diagnosis until the date of first metastatic relapse. The follow-up was calculated from the date of diagnosis to the date of last news for event-free patients. Survival was calculated using the Kaplan–Meier method and compared between groups with the log-rank test. Univariate and multivariate analyses were done using generalized linear models or Cox's proportional hazards in case of survival analysis. Patients who died from other causes and did not experience a metastatic relapse before dying were censored at the time of death and were not considered as an event for the estimation of DMFS.
Results
Unsupervised analysis and heterogeneity of IBC
First, we assessed the effect of the normalization strategy on the technical and biologic variation in gene expression. Results of this analysis are presented in Supplementary Fig. S2. PCA shows that normalization was effective in removing technical, interlaboratory variation in gene expression. After normalization, ER activation scores were significantly different between samples classified according to the various molecular subtypes (P < 0.001), suggesting that the normalization procedure did not remove biologic differences.
Second, using UHCA followed by silhouette score analysis, we identified 4 robust clusters in our series of 137 IBC samples (Fig. 1A). The details of the silhouette analysis are provided in Supplementary Figs. S3 and S4. A significant association was observed between these sample clusters and the molecular subtypes (P < 0.001). In agreement with this observation, ER-, progesterone receptor (PR)-, and ERBB2-pathway activation scores were also significantly different between the different clusters (P < 0.001). When a similar analysis was conducted on the nIBC series (N = 252), 6 sample clusters, again related to the molecular subtypes (P < 0.001), were identified (Fig. 1B). Arguably, the increase in number of clusters observed in nIBC can be explained by the increase in sample size (137 vs. 252). Therefore, we analyzed 10 sets of 137 randomly selected nIBC samples. The median number of clusters in these series of samples equaled 6 (range, 4–10), which was significantly different from the number of sample clusters observed in the series of IBC samples (P = 0.002), suggesting that IBC is indeed less heterogeneous than nIBC.
UHCA. Clustering was done with Manhattan distance as similarity measure and Ward linkage as the dendrogram drawing method for probe sets having a SD greater than 2 in our series of IBC (A, 298 probe sets) and nIBC (B, 251 probe sets) samples only, as well as in the combined dataset (C, 270 probe sets). Median-centered gene expression data are represented in a matrix format with rows indicating genes and columns indicating samples. Overexpressed genes (compared with the median) are color-coded red and repressed genes are color-coded green. Color saturation indicates the level of overexpression or repression. The clusters that were considered stable using the silhouette algorithm are indicated in the dendrogram using alternating gray and blue colors. The classification of the samples according to the molecular subtypes (basal-like, claudin-low, HER2-enriched, luminal A, luminal B, and normal-like) is represented underneath the heatmaps in matrix format. Tumor subtype classification (IBC vs. nIBC) is also indicated underneath the heatmap representing the combined dataset. Black vertical ticks indicate class membership. For convenience, the shading pattern of the dendrograms is repeated on the class-membership matrix.
When the earlier outlined analysis was repeated on the combined dataset of IBC and nIBC samples, 7 robust sample clusters were identified (Fig. 1C). Downstream of the first bifurcation, luminal A, luminal B, and normal-like samples were segregated from basal-like, HER2-enriched and claudin-low samples (P < 0.001). In addition, a significant segregation of IBC and nIBC samples was observed (P < 0.001). Linear regression analysis using the cluster label as the dependent variable and both the tumor phenotype and the different molecular subtypes as independent variables showed that molecular subtypes, and not the tumor phenotype, are independent predictors of the cluster label (tumor phenotype, P = 0.104; molecular subtypes, P < 0.050). These data suggest that differences in gene expression between IBC and nIBC are dominated by the molecular subtype-related differences in gene expression.
IBC/nIBC molecular comparison and influence of the molecular subtypes
We sought to determine differences in gene expression between IBC and nIBC in a gene-by-gene supervised manner. Table 2 provides an overview of the results of this comparison. When comparing IBC to nIBC, we observed significant differences in gene expression, regardless of the type of stage-matching conducted for the nIBC control group. On initial review, differences between IBC and advanced stage (III–IV) nIBC seemed smaller than between IBC and all nIBC samples, as suggested by the lower Q value. However, when differences between IBC samples and nonstage-matched nIBC samples were analyzed in 10 equally large series or randomly selected samples, no significant differences for Q values were recorded (P = 0.345), suggesting that the determinant responsible for this difference is sample size and is not due to the type of stage-matching (Table 2 and Fig. 2A). Similar analyses were conducted for the numbers of differentially expressed genes at FDR < 0.050 and FDR < 0.100 (Table 2 and Fig. 2B and C). Lists of differentially expressed genes (FDR < 0.100) identified in the comparisons of IBC to alternatively composed nIBC control groups (i.e., nonstage matched, advanced stage (III–IV) only, or low stage (I–II) only) were evaluated to assess the effect of the type of stage-matching on the gene list content; results are presented by a Venn diagram (Fig. 2D). Of the genes identified in the smallest gene list (IBC vs. stage III–IV nIBC), 98% and 91% of the genes were present in the gene lists comparing respectively IBC to all nIBC samples and IBC to the stage I–II nIBC samples only. When comparing IBC with either all nIBC samples or with the stage I–II nIBC samples only, an average concordance of approximately 87% was observed between both gene lists (data not shown). These results suggest that stage matching had limited influence, allowing us to retain all nIBC control samples for further analysis.
Supervised analysis between IBC and nIBC according to nIBC stage-matching. Comparison of parameters of differential expression [Q values (A), number of significant genes at FDR < 0.050 (B), and number of significant genes at FDR < 0.100 (C)] between IBC and nIBC based on a comparison with stage-matching (stage III–IV) and for the comparison with no stage-matching (stage I–IV). The blue vertical line indicates the value for the corresponding parameter for the stage-matched comparison. The black line indicates the distribution of the values of the same parameters when analyzing differential gene expression in the nonstage-matched analysis using 10 random series of equal sample sizes, as was the case for the stage-matched comparison. The Venn diagram (D) shows the number of differentially expressed genes between IBC and nIBC samples when using stage-matched nIBC samples only (stage III–IV) as comparator or when using stage I–II nIBC only as comparator or when using all nIBC samples regardless of stage (stage I–IV) as comparator. Numbers indicate the number of genes in each section.
Results of supervised analyses between IBC and nIBC according to the nIBC stage
We then compared IBC and nIBC for several expression profiles, including molecular subtypes and pathway and transcription factor activation signatures. Results are shown in Supplementary Table S4. About the molecular subtypes, similar frequencies for all subtypes were observed except for the luminal A subtype (42% in nIBC vs. 19% in IBC; P < 0.001) and the HER2-enriched subtype (9% in nIBC vs. 22% in IBC; P < 0.001). Out of 19 tested pathways, 12 (63%) were differentially activated between IBC and nIBC: 8 pathways were more activated in IBC (CTNB, HER2, MYC, RAS, IFN-α, IFN-γ, TNF-α, and VEGF), whereas 4 pathways were attenuated in IBC (ER, PR, P53, and TGF-β). When analyzing the transcription factor activation profiles, we identified 78 of 234 (33%) differentially activated transcription factors, from which 38 (49%) were more activated in IBC and 40 (51%) were more activated in nIBC. Among the transcription factors hyperactivated in IBC, we identified RELA, corroborating the results from previous studies (24, 25).
Molecular subtype-independent comparison of IBC and nIBC
Because of the strong correlation between the IBC/nIBC phenotype and the molecular subtypes, we hypothesized that the results presented earlier could be related to the molecular subtypes, and hence were not actually IBC-specific. Thus, we reiterated the IBC/nIBC comparative analysis (gene expression, pathway activation, and transcription factor activation) using linear regression models to identify probe sets that are differentially expressed between IBC and nIBC in a molecular subtype independent manner. To be able to evaluate the extent by which the identified probe sets truly reflect IBC biology, we divided our series into a training and validation set. No differences between both sets were observed with respect to the distribution of the molecular subtypes (P = 0.955), the tumor stage (P = 0.954) and the institution from which the samples were derived (P = 0.761).
In the training set, we identified 491 probe sets, representing 443 unique and IBC-specific genes, of which 231 (47%) and 260 (53%) were respectively up- or downregulated in IBC (Supplementary Table S5). For comparison, 2,743 genes were differentially expressed (FDR < 0.050) between IBC and nIBC in the training set when molecular subtype-dependent gene expression differences were not considered. Within the list of 491 probe sets, 79 probe sets (16%) were uniquely IBC-specific, whereas the remaining probe sets (N = 412) showed additional molecular subtype-specific gene expression variation. Both the complete list of probe sets with an IBC-specific expression component (N = 491) and the list of probe sets that are uniquely IBC-specific (N = 79) were used for further validation. Both lists are differentially expressed between IBC and nIBC (P < 0.001) in the independent validation set with a slightly greater expression difference for the list of uniquely IBC-specific probe sets (observed Q = 51.677 vs. observed Q = 50.893). Further details are provided in the Supplementary Fig. S5.
Using the nearest shrunken centroid method, classifier models were constructed on the training set and applied onto the validation dataset. Cross-validated probabilities for the training set and posterior probabilities for the validation set are provided for both models in Fig. 3. For the full list of probe sets with an IBC-specific expression component, we observed an overall test accuracy of 68% with a sensitivity of 66% and a specificity of 70% (OR, 4.434; P < 0.001). Using the list of probe sets with a uniquely IBC-specific expression pattern, we observed an overall test accuracy of 71% with a sensitivity of 65% and a specificity of 74% (OR, 5.136; P < 0.001). We next evaluated the performance of both models to discriminate between samples from patients with and without IBC in more homogenous subgroups. The model based on the list of uniquely IBC-specific probe sets (N = 79) conducted well in discriminating between samples from patients with and without IBC in the luminal (N = 70; OR, 3.909; P = 0.019) and TNBC (N = 40; OR, 4.839; P = 0.027) series. Evaluating the model based on the full list of probe sets (N = 491) in the earlier described subgroups showed poor performance, probably due to the presence of a molecular subtype-specific expression component for the majority of the involved probe sets.
Construction and assessment of the IBC/nIBC classification models. Using PAM, classification models to discriminate between IBC and nIBC were constructed. The first model was based on the full list of 491 probe sets with an IBC-specific expression component. The second model was based on the list of 79 uniquely IBC-specific probe sets. Both models were constructed using the training set and tested on the validation set. The leave-one-out cross-validated probabilities for all samples in the training set based on the full list of probe sets with an IBC-specific expression component (A) and the list of uniquely IBC-specific probe sets (C) are provided in a dot plot format. Each sample is represented by 2 colored dots, with red corresponding to the cross-validated probability that the related sample belongs to the IBC phenotype and green corresponding to the cross-validated probability that the related sample belongs to the nIBC phenotype. The real classification of the samples according to tumor phenotype is provided along the X-axis, with samples to the left of the vertical dashed line being IBC samples and vice versa. One can clearly observe that the majority of the IBC and nIBC samples are correctly classified using both models, when using a cross-validated probability of 0.500 as threshold for classification. Both models were subsequently evaluated in an independent test set. The posterior probabilities to be classified as IBC for all samples in the validation set based on the full list of probe sets with an IBC-specific expression component (B) and the list of uniquely IBC-specific probe sets (D) are provided in a dot plot format. A red dot indicates an IBC sample; a green dot indicates a nIBC sample. The posterior probabilities are plotted in ascending order. Most nIBC samples have low posterior probabilities to be classified as IBC and vice versa. Again, when using a posterior probability of 0.500 as threshold for classification we obtain a classification accuracy of 68% and 71% for the model based on the full list of probe sets and the model based on the list of uniquely IBC-specific probe set.
To identify relevant pathways, gene functions and regulatory networks associated with IBC biology, we decided to analyze the full list of probe sets with an IBC-specific expression component. We reason that the presence of an IBC-specific expression component for each of the involved probe sets lends credit to the use of this list of probe sets as a starting point to decipher the molecular biology of IBC. Detailed results of the Ingenuity Pathway Analysis are provided in Supplementary Tables S6 and S7 and Supplementary Figs. S6 and S7. We observed that inflammation- and immune-related processes, including NF-κB–signaling, characterize the IBC tumor phenotype. We identified a regulatory gene network composed of genes repressed in IBC and centered on TGF-β (Supplementary Fig. S7A).
Next, we focused on IBC-specific pathway and transcription factor activation patterns. When molecular subtypes were incorporated into the model, activation of the IFN-α–pathway and inhibition of the P53-, TGF-β-, and PR-pathways were associated with IBC (FDR < 0.050). Several other pathways known to be associated with heterogeneity in breast cancer, including ER, HER2, and EGF receptor (EGFR), ranked among the lesser significant pathways (FDR = 0.153, 0.124, and 0.343, respectively), suggesting that the influence of the molecular subtypes on these results was removed. Upon reassessment of the transcription factor activation profiles, we identified 18 IBC-specific transcription factors, 10 of which had attenuated activity (MZF1, GC, FOXO3/STAT5, BCL3, PITX2, MSX2, TCF3/TCF4, MYB, PEG10, and ID1) and 8 of which are activated (EGR, LXR, STAT3, HNF4A, THRA, SHH, FOXM1, and MKX2-2) in IBC (data not shown). Inhibition of SMAD in IBC was also observed with a significant nominal P value although the FDR exceeded our thresholds for significance (FDR = 0.084; P = 0.009; rank = 23rd/234). As expected, ER ranked among the lesser significant transcription factors (FDR = 0.246; P = 0.055; rank = 52nd/234).
Prognostic value of the IBC/nIBC signature in nIBC
Given the poor prognosis of IBC, we hypothesized that our 79-gene list, if biologically relevant with respect to the IBC/nIBC distinction, might be associated with the occurrence of metastatic relapse in nIBC in general. We thus tested its prognostic value in a series of 871 clinically annotated and chemotherapy- and hormone therapy-naïve node-negative early breast cancers. The SVM model classified 83.6% of the samples with a “nIBC-like” profile (N = 728) and 16.4% with an “IBC-like” profile (N = 143). We compared the DMFS of these 2 nIBC classes. With a median follow-up of 110 months after diagnosis (range, 1–286), the “nIBC-like” class had a 5-year DMFS of 75%, which was significantly longer than the 5-year DMFS of 62% observed for samples classified as “IBC-like” (P = 0.002). Kaplan–Meier plots are shown in Supplementary Fig. S8. In univariate analysis (Supplementary Table S8), the risk of distant metastatic recurrence in the “IBC-like” class was significantly elevated as compared with the “nIBC-like” class [HR, 1.530; 95% confidence interval (CI), 1.160–2.020; P = 0.002]. The other significant variables (P < 0.050) included the histologic grade, the molecular subtypes, and the pathologic tumor size. Patient age at diagnosis was not significant. Interestingly, our SVM model remained significant (HR, 1.480; 95% CI, 1.040–2.100; P = 0.029) in multivariate analysis incorporating all variables significant in univariate analysis (pT status, histologic grade, and molecular subtypes), suggesting that the 79-gene signature contains molecular subtype-independent prognostic information that might explain the worse prognosis of patients with IBC.
Discussion
This whole transcriptome study of IBC, the largest one reported so far, was made possible thanks to collaboration through the World IBC Consortium. The present study addressed several criticisms raised with respect to the earlier studies: a uniform IBC definition put forward by an international expert panel (11) was used, pretreatment samples were profiled using the same Affymetrix platform and a high number of IBC samples was included in addition to a high number of nIBC control samples encompassing different disease stages.
We have shown that IBC is transcriptionally heterogeneous, as shown by the identification of 4 robust sample clusters. The identification of all existing molecular subtypes in IBC corroborates this finding. Compared with nIBC, the transcriptional heterogeneity was less extensive and the distribution pattern of the molecular subtypes between 2 tumor groups differed, particularly with respect to the luminal A and HER2-enriched samples. Overall, 75% of the IBC samples belong to the classically more aggressive subtypes, basal-like, HER2-enriched, claudin-low, or luminal B subtypes, whereas these subtypes account for 54% of the nIBC samples. The luminal A subtype represents 19% of the IBC samples, whereas in nIBC this subgroup represents 42%. These data corroborate the results reported in previous publications where the presence of all molecular subtypes in IBC, albeit with a different frequency, was reported (9, 26, 27). The major difference resides in the fact that all subtypes, except for the luminal A and the HER2-enriched tumor samples, exhibit a similar prevalence in IBC as in nIBC, whereas in earlier studies IBC was reported to be enriched for the basal-like and HER2-enriched molecular subtypes (9, 27).
One of the aims of this study was to redefine the molecular profile of IBC, taking into account the points of criticism raised in earlier studies. The issues related to sample size, the definition of IBC cases and the interstudy differences in technical platform are, to the best of our efforts, resolved by the study design but the objections with respect to the composition of the control group remain. Our results clearly show that the differential distribution of the molecular subtypes between both tumor groups needs to be considered. As shown by unsupervised hierarchical clustering, IBC and nIBC are significantly segregated downstream of the first bifurcation, suggesting that there are major differences in gene expression between the IBC and nIBC tumor phenotypes. However, multivariate regression analysis revealed that the presence of the different molecular subtypes is the main driver of the clustering pattern. Therefore, differences in gene expression between IBC and nIBC are dominated by the differential distribution pattern of the molecular subtypes, which is corroborated by the fact that only 18% of the genes initially reported to be differentially expressed between IBC and nIBC remain differentially expressed after conducting the linear regression analysis to account for the influence of the molecular subtypes. Of note, the number of genes with a uniquely IBC-specific gene expression profile is even smaller and represents only 3% of the global expression differences. The differential pattern of the molecular subtypes, at least in part, explains the higher proportion of samples with a molecular “poor-prognosis” profile in IBC than in nIBC when classified according to previously published prognostic gene signatures, as well as the differential activation of biologic pathways (data not shown). On the other hand, the influence of the composition of the control group according to tumor stage seems to be limited.
We thus decided to compare IBC with nIBC, incorporating all stages (stage I–IV) of breast cancer and to use regression modeling to discriminate between molecular subtype-specific effects and tumor phenotype-specific effects. Alternatively, we could have compared IBC with a nonstage-matched but molecular subtype-matched nIBC control group. However, this would lead to smaller sample sizes, and hence, considerable loss of statistical power, a disadvantage not encountered when using the regression models on the entire dataset.
The identified IBC-specific molecular changes both confirm results obtained in previous studies and reveal novel findings. MARCKS, a gene that ranked among the top 10 differentially expressed genes between IBC and nIBC, is involved in regulating cell motility via its function as a regulator of the actin cytoskeleton (28). Interestingly, Van der Auwera and colleagues reported that MARCKS was one of the targets of miR-30b, a phosphoinositide 3-kinase (PI3K)-targeting miRNA specifically downregulated in IBC (29). Other examples of genes overexpressed in IBC and involved in the actin-based cell migration include RAC1, RHOF, and FNBP1. The regulation of cell motility and invasion was one of the most consistent molecular aspects associated with IBC across different studies (4–6, 9). Also, the role of NF-κB in IBC biology reported in earlier studies (24, 25) is corroborated by the findings in the present study.
By far, the most important biologic observation made within the scope of this analysis relates to TGF-β signaling. Given the aggressiveness of IBC and the role of TGF-β in tumor biology (30), the observation of IBC-specific TGF-β–repression is unexpected. Careful examination of the literature provided several interesting aspects with respect to TGF-β–signaling in breast cancer. First, Giampieri and colleagues showed that TGF-β–signaling switches the activity of breast cancer cells from cohesive to single cell motility. In addition, cells restricted to collective invasion were capable of lymphatic invasion but not blood-borne metastasis (31, 32). Thus, the reduced TGF-β–signaling as a signature of IBC implies that IBC tumor cells migrate collectively and invade primarily into the lymph vessels. This is consistent with the presentation of IBC, with the presence of numerous tightly aggregated nests of tumor cells, defined as tumor emboli that undergo dermal lymphatic invasion as the pathologic hallmark of IBC. On the basis of the central role of TGF-β in inducing the process of epithelial-to-mesenchymal transition (EMT; ref. 30), our observations suggest that EMT is not the primary means of tumor cell migration/invasion in IBC. This agrees with the overexpression of E-cadherin in IBC (33–35), which is a protein involved in retention of an epithelial phenotype that is lost during EMT, during which there is a gain of expression of genes involved in mesenchymal cell differentiation. Second, recent studies illustrate the importance of active TGF-β–signaling in preventing the proliferation of ER+ tumor cells, and suggest that the loss of this inhibitory interaction may be important in early breast cancer progression (36). Therefore, we hypothesize that the reduced TGF-β–signaling in IBC might lead to increased proliferation of ER+ tumor cells, which might be one of the reasons for the low frequency of luminal A samples in IBC. Interestingly, we recently described that CDKN1B, a TGF-β-target responsible for inhibition of cell proliferation, is repressed in IBC (36). In the current series, we observed a molecular subtype independent repression of CDKN1B in IBC (data not shown).
Our results also identified several transcription factors with an IBC-specific activation profile. In line with the observations made on repression of TGF-β signaling, we observed reduced IBC-specific activity of the SMAD transcription factor, which acts downstream of TGF-β–signaling. In fact, correlation analysis using independent gene sets suggests that both observations are linked (data not shown). Further corroborating our hypothesis that EMT is not the main mode of invasion observed in IBC is the finding of IBC-specific attenuation of the TCF3/TCF4 transcription factor activity, which is involved in the induction and maintenance of EMT through the repression of E-cadherin (37).
In summary, the results obtained within the scope of this study provide novel insights into the biology of IBC and open an entirely new avenue of research topics. Without doubt, the most remarkable identified feature of IBC is the attenuated TGF-β–signaling, and by extension the potential effects thereof on IBC tumor cell migration and invasion, which warrants further investigation. Our data also show that the differential distribution pattern of the molecular subtypes between IBC and nIBC should be considered when doing IBC research (e.g., in the present study, up to 82% of the differentially expressed genes between IBC and nIBC are not IBC-specific), not only in translational studies but also in basic research. For example, the SUM149 cell line, although derived from a patient with IBC, remains a basal-like cell line, and therefore observations made using this model system might not be IBC-specific. In fact, application of the 79-gene classifier onto a collection of 7 IBC cell lines obtained through the World IBC Consortium revealed that out of multiple replicates for each cell line, only the SUM149, KPL4, and FC-IBC-01 cell lines were reliably classified as IBC (Prof. Robertson; personal communication).
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: S.J. Van Laere, N.T. Ueno, P. Vermeulen, S. Krishnamurthy, P. van Dam, W.A. Woodward, P. Viens, L. Dirix, F. Bertucci
Development of methodology: S.J. Van Laere, S. Krishnamurthy, L. Dirix, F. Bertucci
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.J. Van Laere, N.T. Ueno, P. Finetti, F.M. Robertson, S. Krishnamurthy, P. Viens, D. Birnbaum, L. Dirix, F. Bertucci
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S.J. Van Laere, P. Finetti, T. Iwamoto, S. Krishnamurthy, L. Dirix, F. Bertucci
Writing, review, and/or revision of the manuscript: S.J. Van Laere, N.T. Ueno, P. Vermeulen, A. Lucci, F.M. Robertson, M. Marsan, T. Iwamoto, H. Masuda, P. van Dam, W.A. Woodward, P. Viens, M. Cristofanilli, D. Birnbaum, L. Dirix, J.M. Reuben, F. Bertucci
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S.J. Van Laere, P. Finetti, M. Marsan, F. Bertucci
Study supervision: S.J. Van Laere, N.T. Ueno, P. Vermeulen, A. Lucci, P. van Dam, P. Viens, D. Birnbaum, F. Bertucci
Grant Support
The Institut National du Cancer (INCa) Translational Research Grant 2007 (to F. Bertucci); Translational Research Grant 2009 (D. Birnbaum); the Ligue Nationale Contre le Cancer (label D. Birnbaum); the NIH R01CA138239-01 and R01 CA138239 ARRA supplement (to W.A. Woodward and J.M. Reuben); The State of Texas Grant for Rare and Aggressive Cancers (to N.T. Ueno, J.M. Reuben, A. Lucci, and W.A. Woodward); The American Airlines Komen Foundation Promise Grant KGO81287 (to W.A. Woodward, J.M. Reuben, and A. Lucci); Komen Foundation Grant KG101478 (to W.A. Woodward); and Translational Research Grant 220-2008, Stichting tegen Kanker (L. Dirix).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Footnotes
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
- Received August 1, 2012.
- Revision received January 18, 2013.
- Accepted January 22, 2013.
- ©2013 American Association for Cancer Research.