
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Human Cancer Biology |
Authors' Affiliations: 1 National Cancer Centre; 2 Agenica Research; and 3 Genome Institute of Singapore, Singapore, Republic of Singapore
Requests for reprints: Patrick Tan, National Cancer Centre, 11 Hospital Drive, 169610 Singapore, Republic of Singapore. Phone: 65-6-436-8345; Fax: 65-6-226-5694; E-mail: cmrtan{at}nccs.com.sg.
| Abstract |
|---|
|
|
|---|
Experimental Design and Results: The SA successfully identified multiple breast cancer modules specifically linked to distinct biological functions. We identified a novel module, TuM1, whose presence was not readily discernible by conventional clustering techniques. The TuM1 module is expressed in a subset of estrogen receptor (ER)positive tumors and is significantly enriched with genes involved in apoptosis and cell death. Clinically, TuM1-expressing tumors are associated with low histopathologic grade, and this association is independent of the inherent ER status of a tumor. We confirmed the robustness and general applicability of TuM1 module by demonstrating its association with low tumor grade in multiple independent breast cancer data sets generated using different array technologies. In vitro, the TuM1 module is down-regulated in ER+ MCF7 cells upon treatment with tamoxifen, suggesting that TuM1 expression may be dependent on active signaling by ER. Initial data is also suggestive that TuM1 expression may be clinically associated with a patient's response to antihormonal therapy.
Conclusion: Our results suggest that modular-based approaches toward gene expression data can prove useful in identifying novel, robust, and biologically relevant signatures even from data sets that have been the subject of substantial prior analysis.
In this study, we tested the hypothesis that novel biological information could be uncovered in these breast cancer data sets using this modular technique. We applied the SA to a set of breast cancer expression profiles and successfully defined multiple tumor modules (TuM), each associated with a distinct biological function. Most significantly, the SA identified a previously unreported module (TuM1) in a subset of estrogen receptorpositive (ER+) tumors containing genes significantly enriched in cell death and apoptosis. The TuM1 module is not discernible by conventional hierarchical clustering cluster analysis and proved to be a robust signature by repeated random sampling assays (see Results). To further characterize the biological and clinical relevance of TuM1, we show that tumors expressing the TuM1 module are associated with low histologic grade (P < 0.001) and that this association is independent of the inherent ER status of the tumor. The TuM1/grade association is generally applicable as it is observed across multiple independent data sets representing distinct patient populations and array technologies. We also find that in vitro, the TuM1 module is expressed in ER+ MCF7 cells but down-regulated upon treatment with tamoxifen, suggesting that TuM1 expression may depend on active ER signaling. Motivated by this finding, we provide clinical data suggesting that TuM1 expression in primary tumors may identify patients more likely to respond to antihormonal therapy. By identifying a novel clinically relevant molecular signature in breast cancer, our results thus show that modular approaches to gene expression data, such as SA, can successfully reveal novel biological information even from data sets that have received substantial prior analysis.
| Materials and Methods |
|---|
|
|
|---|
Cell culture and tamoxifen treatment. MCF-7 breast cancer cells were obtained from American Type Culture Collection (Manassas, VA), and cells were cultured in DMEM (Life Technologies, Grand Island, NY) supplemented with 10% fetal bovine serum, 100 units/mL penicillin, 100 units/mL streptomycin, and 2 mmol/L L-glutamine. Before tamoxifen treatment, cells were washed thrice in PBS and maintained in phenol redfree DMEM with 5% dextran charcoalstripped fetal bovine serum (HyClone Laboratories, Pittsburgh, PA) for 24 hours. Subsequently, cells were treated with 10 µmol/L tamoxifen (Sigma, St. Louis, MO) and harvested at 48 hours. Control sister cultures were treated with an equivalent volume of the vehicle (0.1% ethanol).
Sample preparation and microarray hybridization. RNA was extracted from tissues and cell lines using Trizol (Invitrogen, Carlsbad, CA) reagent and processed for Affymetrix Genechip (Affymetrix, Inc., Santa Clara, CA) hybridizations using U133A Genechips according to the instructions of the manufacturer. The expression profiling of MCF-7 cell lines was done in duplicate from two independent sets of RNA samples each comprising control untreated MCF7 cells, cells treated with 10 µmol/L tamoxifen for 48 hours, and cells treated with vehicle (0.1% ethanol) for 48 hours. The expression profiling of MCF-7 cells was done on HG-U133 plus gene chips. The hybridization signal on the chip was scanned and processed by GeneSuite software (Affymetrix).
Data processing. Raw Genechip scans were quality controlled using GeneData Refiner (Genedata, Basel, Switzerland). The expression data was preprocessed by removing genes whose expression was absent in >40% samples (i.e., "A" calls), subjecting the remaining genes (9,116 probes) to a log2 transformation, and normalization by median-centering of samples. The expression data has been deposited into the Gene Expression Omnibus database (GSE2294).
SA and iterative signature algorithm. The basic SA methodology consists of four major steps: (a) a predefined set of "input genes" is selected; (b) using these input genes, the algorithm scans the expression data set, selecting samples (i.e., tumors) where the average expression of the input genes (tumor scores) is above a threshold value ("tumor threshold"); (c) within the selected tumors, individual genes whose weighted (by tumor scores) average expression exceeds a "gene threshold" are then identified, resulting in (d) a TuM being outputted, comprising a set of genes with expression levels above a particular threshold in a specific group of tumors. A detailed description of the SA methodology is provided in ref. 11. In this report, we use an extension of SA, the iterative signature algorithm (ISA), which uses a large number of random gene sets as the initial input and subsequently refines the TuMs through multiple iterative rounds of SA (12). As the inputted genes are random, ISA does not require prior knowledge and hence constitutes an entirely unsupervised analytic approach. Specific details about the ISA workflow and parameter settings are described in Supplementary Information S2. Based on previous reports, a gene threshold of 3.0 was selected as an optimal threshold for further in-depth analysis (11).4
Recurrence analysis to measure module robustness. SA uses recurrence analysis to assess the robustness of a module. For a given gene set (e.g., TuM1), a collection of new derived sets are created containing both the input genes and genes randomly selected from the entire data set. SA is then done on both the input set and the derivation sets. If the input set has a meaningful coregulated pattern, then this pattern should be strongly preserved in the derivation sets, and consequently the various output modules will have a large overlap (ref. 11; see Results for illustration). On the other hand, if there is no coregulated pattern embedded in the input set, the output modules will be quite different and little overlap will be observed. The details of recurrence analysis are further described in ref. 11, which also provides a mathematical definition of the recurrence metric.
Gene ontology and pathway analysis. We used the statistical web tool GoStat to identify functional annotations or Gene Ontology groups that are highly enriched in different gene sets (13). Fisher's exact test was done to calculate the significance of the observed enrichment, combined with a Benjamini and Hochberg correction to control the false discovery rate.5 Additional functional and pathway analysis was done using Ingenuity pathway analysis,6 a commercial database for identifying networks and pathways of interest in genomic data that was also been used in several other published reports (14, 15). The Ingenuity pathway analysis system uses a proprietary ontology representing over 300,000 classes of biological objects and semantically encoded relationships from the public domain literature to assign biological functions to a query data set (e.g., Affymetrix probes). The significance of functional enrichment is computed by a Fisher's exact test, and represented by a range of P values associated with either top-level functions or related subfunctions.
Associations between TuMs and clinical data.
2 Tests were used to calculate the association between each TuM and the following clinical variables: patient age, lymph node status, ER status, progesterone receptor status, tumor size, histologic grade (as continuous variable), and lymphovascular invasion. The significance of each association was also confirmed by hypergeometric probability density function analysis. Linear regression was used to confirm the independence of the TuM1/grade association from ER status in multivariate analysis. For multidata set analyses, we identified common Unigenes between the Affymetrix U133A Genechip and Stanford, Rosetta, and Ma data sets (see Results), whereas the Uppsala data sets were matched directly by probe sets. Kaplan-Meier analysis was used for survival comparisons, and Cox regression was used to confirm the prognostic significance of TuM1 in multivariate analysis.
Gene set enrichment analysis. Gene Set Enrichment Analysis (GSEA) methodology, a modification of the weighted Kolmogorov-Smirnov statistic, provides a general statistical framework to test for the enrichment of gene expression profiles (16). GSEA considers a priori defined gene set, such as coregulated genes, and determines whether these members are enriched at the top (or bottom) of a list of markers ranked by the degree of correlation with a specific phenotype or class distinction. Multiple hypothesis testing is adjusted by calculating false discovery rates (16). The false discovery rate is the estimated probability that the reported result is a false positive. The details of GSEA are provided in ref. (16). The default parameter settings were used in the analysis.
| Results |
|---|
|
|
|---|
We applied the ISA to a set of 96 breast cancer gene expression profiles, resulting in a modular decomposition of the gene expression data at different gene thresholds (12). Figure 1A
illustrates this concept in the form of a "module tree." At low gene thresholds, only a few TuMs are initially identified, where each TuM consists of a large number of loosely correlated tumors and genes. At higher resolutions, the expression data is decomposed into a larger number of TuMs, where each TuM now contains a smaller set of tightly correlated tumors and genes. At a gene threshold of 3.0, we defined eight TuMs in the breast cancer expression data (TuMs 1-8). To place these modules in a biological context, we used the GoStat tool to identify biological or cellular functions that were significantly overrepresented in each module. Consistent with previous reports, many of these modules could be associated with distinct biological functions, such as extracellular matrix and collagen binding activities in TuM5 (corrected P values being P = 2.85 x 106 and 8.72 x 106 respectively), and cell cycle/cellular proliferation in TuM7 (P = 4.08 x 1016; detailed descriptions of each TuM and the GO analysis are provided in Supplementary Information S3). Some modules were clearly related. For example, three modules (TuM1, TuM2, and TuM3) were commonly derived from a single larger module containing genes previously reported as highly expressed in ER+ tumors, such as ESR1, STC2, and BCL2 (37). Interestingly, this ER-related gene set has previously been treated in other studies as largely homogenous; however, its successful decomposition into smaller distinct units by the ISA suggests that the larger module may actually comprise multiple distinct and possibly independent biological subprograms. Although TuM2 (38 genes) and TuM3 (30 genes) exhibit substantial overlaps (
50%) in gene content (e.g., STC2, BCL2), >80% of the genes in TuM1 (33 genes) are not found in either TuM2 or TuM3. We did a survey of the literature and confirmed that the TuM1 module was previously unreported. The identification of TuM1 as a novel module thus shows the ability of the modular approach to reveal new molecular patterns in genome-wide expression data.
|
50%) in the filtered data. Average-linkage hierarchical clustering using a Pearson correlation metric was done on this gene set. Consistent with previous reports, the clustering analysis revealed a very large cluster of ER-related genes (
560 genes), but importantly within this group the TuM1 genes did not uniformly group with one other to form a "subcluster"indeed, some TuM1 genes failed to localize within the ER cluster altogether (Fig. 2
). Similar results were obtained when the hierarchical clustering was done on the global ISA-input gene set of 9,116 probes (Supplementary Information S4). This result indicates that it would have been highly unlikely for TuM1 to be readily discernible using conventional clustering approaches, supporting our hypothesis that novel biological information remains in these data sets despite their having received substantial prior analysis, which can be unearthed using alternative analytic methods such as SA. For the remainder of this report, we now focus on the novel TuM1 module in terms of its gene content, robustness, clinical associations, and general applicability.
|
25% of the ER+ breast cancers in our initial data set of 96 tumors. Using a commercial database (Ingenuity), we did pathway analysis on TuM1 and found that genes related to cell death and apoptosis were significantly represented within this module (P = 1.66 x 105 to 0.034), such as programmed cell death 4 (PDCD4), mitochondrial ribosomal protein S30 (MRPS30), and gap junction protein,
1, 43 kDa (connexin 43; GJA1). Other genes in TuM1 include the xenobiotic-metabolizing enzymes NAT1 and FMO5, and PCM1, which was recently reported to be associated with histologic grade in breast cancer (17). A fully annotated list of TuM1 genes is listed in Supplementary Information S5. This pathway analysis result suggests that the TuM1 module is likely to be biologically coherent and functionally significant. To confirm that the identification of TuM1 was not dependent on the specific samples in our initial data set, we evaluated the robustness of the TuM1 module using two different techniques. First, we did recurrence analysis in an independent data setin this method, random genes are added to TuM1 (33 members) to generate a series of TuM1-derived input gene sets, and SA is done on both TuM1 and the derived sets. The outputted modules are compared and the gene content overlap between the different output modules is determined. TuM1 is considered to be robust if the overall overlap (or "recurrence score") of the output modules is greater than a threshold (Fig. 3A ) based on random input data. Specifically, we asked if the TuM1 module could be observed in an independent data set of 86 breast tumors that were not used in the original identification of TuM1. We did recurrence analysis on this independent set and found that TuM1 indeed emerged as a highly recurrent coregulated module (Fig. 3B), with the TuM1 molecular signature in this independent set also being confined to ER+ tumors at proportions similar to the original data (data not shown). Second, we further tested the robustness of TuM1 using repeated random sampling, a stringent validation technique recently proposed by Michiels et al. (9) to validate the reliability of gene signatures. We combined the original and independent test set samples and randomly generated one hundred sets of 96 tumors (96 being the same number as the original set). We did recurrence analysis on all 100 random sets, and found that in >85% of cases TuM1 displayed substantially higher recurrence scores compared with random data (Fig. 3C). In the remaining 15%, the failure to observe TuM1 could be attributed to the lack of TuM1-expressing tumors in the random set (Y.K., data not shown). We also independently repeated the entire ISA on a subset of these randomly generated sets and confirmed that the TuM1 module could be rederived (Y.K., data not shown). These results show that the TuM1 module is indeed highly robust within our center, and later in this report we also show that the TuM1 module is also present in breast cancer expression data sets from other groups.
|
|
|
|
A possible association between TuM1 expression and treatment response or clinical outcome. Tamoxifen is a standard antihormonal therapy used to treat ER+ breast cancer patients. Our finding that expression of the TuM1 module is dependent on active ER signaling made us investigate if the presence of this module in primary tumors might function as a molecular biomarker for active ER activity, and identify tumors that are likely to respond to tamoxifen or other antihormonal treatments. Supporting this possibility, certain genes in TuM1 have also been independently shown to be associated with therapeutic response in breast cancer (see Discussion). As clinical response information was not available in our in-house data, we tested three independent data sets where such data was available. First, we tested the Stanford series, which consists of patients who received adjuvant endocrine therapy if their tumors were ER+ (5). Using Kaplan-Meier survival analysis, patients with TuM1-expressing ER+ tumors exhibited better survival outcomes compared with patients with ER+ tumors where TuM1 was not expressed (P = 0.0001 for overall survival; P = 0.0036 for relapse-free survival; Fig. 4A and Supplementary Information S11). In a multivariate analysis of TuM1, grade, age, lymph node, and tumor size, TuM1 behaved as an independent predictor of survival outcome, whereas grade did not, demonstrating that TuM1 is more directly prognostic of patient survival than grade status alone (Supplementary Information S12). Second, we tested the Ma data set, which comprises a set of preselected tamoxifen-responsive and resistant ER+ tumors (28). Once again, TuM1-ovexpressing patients exhibited significantly better outcome than low TuM1 patients (P = 0.048; Fig. 4B). By multivariate Cox regression analysis, TuM1 was the sole independent prognosis factor (P = 0.03; Supplementary Information S12); as grade, tumor size, node, and age are controlled in the Ma patient cohort (28). This observation was also tested using GSEA, which confirmed that TuM1 expression was significantly associated with tamoxifen response (P = 0.024; Supplementary Information S13). Third, the prognostic ability of TuM1 was tested on the Uppsala set, an independent patient cohort of 67 ER+ patients who received tamoxifen as monotherapy (29). Once again, patients with TuM1-expressing tumors experienced significantly improved overall survival outcomes compared with low TuM1-expressing patients (P = 0.025; Fig. 4C). By multivariate Cox regression analysis, TuM1 remained significantly associated with survival (P = 0.024), whereas grade, tumor size, and lymph node status did not (Supplementary Information S12). Taken collectively, these preliminary results raise the possibility that TuM1 expression in primary tumors might also be associated with the response of a tumor to clinical treatment, in particular antihormonal therapy.
|
| Discussion |
|---|
|
|
|---|
Many of the genes in TuM1 have intriguing functions relevant to tumor biology, cell death, and treatment response. A few such examples are discussed here. For example, PDCD4 has been shown to inhibit the growth of tumor cells (32), whereas GJA1 has been reported to suppress cell proliferation and tumorigenicity of human glioblastoma cells (33) and to enhance apoptosis in response to chemotherapeutic agents (34). In addition, MRPS30 has been reported as a proapoptotic gene that encodes protein programmed cell death 9 (35), whereas leucine-rich repeats and immunoglobulin-like domains 1 (LRIG1) is a negative regulator of the ErbB family of receptor tyrosine kinases and has been suggested to suppress ErbB receptor function (36). Besides apoptosis-related genes, TuM1 also contains ß-TrCP1 (also known as Fbwla or FWD1), a component of the SKP1-cullin-F-box ubiquitin protein ligase complex, which can activate the nuclear factor-
B pathway and repress cell proliferation (37). Intriguingly, some genes in TuM1 have also been linked to clinical treatment response as well: Inactivation of PDCD4 in human cancers has also been reported to cause decreased sensitivity to both geldanamycin and tamoxifen in breast cancer in vitro (38), whereas NAT1, another TuM1 gene, has been reported as an independent prognostic factor of breast cancer relapse and potential predictor of tamoxifen response (39).
Clinically, a major feature of the TuM1 module is its association with low histologic grade in an ER-independent manner. It is well known that histologic grade strongly correlates with ER status in breast cancer (1826), with ER-negative tumors being predominantly high grade (grade 3). Indeed, consistent with these previous reports, there is a clear bias between ER and grade in all the data sets analyzed in this report (Supplementary Information S14). Because of the strong association between ER status and grade, previous reports attempting to identify "grade signatures" using supervised learning methods, in which genes exhibiting the strongest expression differences between high-grade and low-grade breast tumors are selected, have tended to define low-grade signatures containing multiple ER-related genes, such as GATA3 (6), which could represent possible confounders. In contrast, the TuM1/low-grade association is independent of ER status, as confirmed by multivariate analysis. As for genes up-regulated in high-grade breast tumors (high-grade signatures), the majority seem to be related to cellular proliferation (6). Of interest, we have previously identified a gene signature for the Nottingham Prognostic Index in ER+ tumors, where tumor grade is a major component of the Nottingham Prognostic Index. This previous result also suggests that cell proliferation gene signatures are correlated with grade in an ER-independent manner as well (40).
Functionally, we have also shown in this report that the TuM1 module is expressed in the ER+ MCF7 cell line and is the only breast cancer TuM that is significantly responsive to tamoxifen treatment. This result suggests that expression of the TuM1 module may depend on continuous ER signaling and that TuM1 might represent a potential molecular signature of ER activity. The use of TuM1 as an in vivo biomarker of ER signaling is further supported by our observation that TuM1 is associated with clinical outcome in multiple independent patient cohorts receiving adjuvant hormonal treatment (the Stanford, Ma, and Uppsala cohorts; Fig. 4; Supplementary Information S12). This intriguing but preliminary finding definitely deserves further study and validation on a larger cohort of patients, supported by careful experiment design and data analysis. Interestingly, in the two independent patient cohorts where patients did not receive adjuvant treatment, patients with TuM1-expressing tumors also exhibited a trend toward improved clinical outcome; however, these differences were not statistically significant (P = 0.48 for Rosetta data set and P = 0.07 for Veridex data set; Supplementary Information S15). This is consistent with the hypothesis that the TuM1 module may have a better ability to predict a patient's response to treatment than the intrinsic aggressive of the disease (i.e., the TuM1 signature is a predictive, rather than prognostic, signature).
In conclusion, our result shows the feasibility and utility of applying modular analytic approaches, such as SA, on cancer expression data. Besides breast cancer, our results suggest that, with the increasing availability of larger and comprehensive expression data sets, sophisticated analytic tools, such as SA, may be useful in refining our global understanding of the gene expression pathways in various malignancies.
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
4 The SA software is available for download at: http://barkai-serv.weizmann.ac.il/GroupPage/software.htm. ![]()
5 GoStat is available at http://gostat.wehi.edu.au/cgi-bin/goStat.pl. ![]()
6 http://www.ingenuity.com/products/pathways_analysis.html. ![]()
Received 7/14/05; revised 12/22/05; accepted 2/28/06.
| References |
|---|
|
|
|---|
B (I
B) and ß-catenin as a result of targeted disruption of the ß-TrCP1 gene. Proc Natl Acad Sci U S A 2003;100:87527.
-positive postmenopausal breast carcinoma. Breast Cancer Res 2004;6:R25263.[CrossRef][Medline]This article has been cited by other articles:
![]() |
M. Chanrion, V. Negre, H. Fontaine, N. Salvetat, F. Bibeau, G. M. Grogan, L. Mauriac, D. Katsaros, F. Molina, C. Theillet, et al. A Gene Expression Signature that Can Predict the Recurrence of Tamoxifen-Treated Primary Breast Cancer Clin. Cancer Res., March 15, 2008; 14(6): 1744 - 1752. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. K.T. Tan, L. K. Tan, K. Yu, P. H. Tan, M. Lee, L. H. Sii, C. Y. Wong, G. H. Ho, A. W.Y. Yeo, P. K.H. Chow, et al. Clinical Validation of a Customized Multiple Signature Microarray for Breast Cancer Clin. Cancer Res., January 15, 2008; 14(2): 461 - 469. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Wei and H. Li A Markov random field model for network-based analysis of genomic data Bioinformatics, June 15, 2007; 23(12): 1537 - 1544. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |