
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Imaging, Diagnosis, Prognosis |
Authors' Affiliations: 1 Department of Biochemistry, Pharmacology, and Genetics, 2 Department of Pathology, and 3 Department of Oncology, Odense University Hospital and 4 Human Microarray Centre and 5 Institute of Public Health, University of Southern Denmark, Odense, Denmark
Requests for reprints: Mads Thomassen, Department of Biochemistry, Pharmacology, and Genetics, Odense University Hospital, Sdr. Boulevard 29, 5000 Odense C, Denmark. Phone: 456-5411911; Fax: 456-5411911; E-mail: mads.thomassen{at}ouh.regionsyddanmark.dk.
| Abstract |
|---|
|
|
|---|
Experimental Design: Twenty-six tumors from low-risk patients and 34 low-malignant T2 tumors from patients with slightly higher risk have been examined by genome-wide gene expression analysis. Nine prognostic gene sets were tested in this data set.
Results: A 32-gene profile (HUMAC32) that accurately predicts metastasis has previously been developed from this data set. In the present study, six of the eight other gene sets have prognostic power in the low-malignant patient group, whereas two have no prognostic value. Despite a relatively small overlap between gene sets, there is high concordance of classification of samples. This, together with analysis of functional gene groups, indicates that the same pathways may be represented by several of the gene sets. However, the results suggest that low-risk patients may be classified more accurately with gene signatures developed especially for this patient group.
Conclusion: Several gene sets, mainly developed in high-risk cancers, predict metastasis from low-malignant cancer.
Recently, promising results for improvement of risk assessment, mainly in the high-risk group, have been obtained by gene expression profiling of breast tumors. Different strategies and platforms have been used to accomplish this. Genome-wide gene expression analysis with long oligonucleotides or Affymetrix chips has been used by several groups. A Dutch group used a Rosetta chip with 60-mer oligonucleotides and developed a 70-gene profile, which could predict development of metastasis within 5 years among lymph node–negative patients with higher accuracy than the classic clinical-pathologic methods (1). The 70-gene signature has been validated with similar results by the same group on a cohort of patients, including patients with lymph node–positive disease (2, 3). Another group, also from the Netherlands, did a similar study with the Affymetrix platform and found a 76-gene profile for prediction of distant metastasis within 5 years in patients who had no adjuvant treatment and no lymph node involvement of the disease (4, 5). Furthermore, by use of Affymetrix chips, a Swedish group has developed a 64-gene signature classifying patients into three risk groups (6). These studies have mainly addressed the overtreatment in the high-risk group, and it has been possible to classify a considerable group of the nonmetastatic tumors correctly although having a high sensitivity.
A different approach has been used by Ramaswamy et al. who compared the expression profiles from primary tumors and nonmatched metastases from several different tissues and identified a 17-gene profile characteristic for metastasis but also present in a subset of primary tumors, suggesting prognostic value of the 17 genes. The prognostic value of this signature was confirmed in the data set from van't Veer et al. (7). Chang et al. (8) hypothesized that features of normal wound healing might play an important role in cancer. They identified a core serum response (CSR) profile and validated the prognostic performance of this on the data provided by van de Vijver et al. (8, 9). Sotiriou et al. aimed at a more precise measure for histologic grade and developed a 97-gene profile capable of separation of a considerable fraction of grade 2 tumors as grade 1 or grade 3 like. This profile had also shown prognostic power in several data sets (10).
Besides the genome-wide approaches, focused expression analysis with real-time PCR on candidate genes has also been used. A 21-gene profile was developed for paraffin-embedded tissue and could predict development of metastasis within a large cohort of tamoxifen-treated patients (11). In another study, the expression of only three genes served as an independent prognostic marker (12).
There is a need for these prognostic signatures to be tested on independent data sets before they come into clinical use. However, a general problem is that different platforms are used and these can be difficult to compare.
Besides the overtreatment in the high-risk group, another important issue is to identify women who would benefit from a treatment they are not offered today. In the low-risk group of patients,
10% of patients experience recurrence and a significant proportion of these patients would probably benefit from adjuvant treatment. The above-mentioned studies hardly include any low-risk patients with metastatic outcome who did not receive adjuvant therapy. Patients receiving adjuvant treatment are less informative because treatment response will bias outcome classification. We have previously developed a 32-gene profile (HUMAC32) that accurately predicts development of metastasis in this group of patients (13). In this study, we compare prediction of metastasis in the low-malignant group by the HUMAC32 profile and the above-mentioned prognostic gene sets mainly developed in higher risk cancers. The study is designed with pairs of metastasizing and nonmetastasizing tumors matched according to classic prognostic markers, demonstrating the independent information from the classifiers. We have developed classification algorithms with the gene sets on our data, reducing the effect of different platforms.
| Materials and Methods |
|---|
|
|
|---|
20 mm, grade = 1 if ductal carcinoma (not otherwise specified), receptor positive, and age
35. In addition, a group of 17 low-malignant T2 tumors [node negative, 20 mm < T
50 mm, grade = 1 if ductal carcinoma (not otherwise specified), receptor positive, and age
35] from patients who developed metastasis and 17 matched patients who did not develop metastasis were included. This group did not fulfill the Danish Breast Cancer Cooperative Group low-risk criteria because the tumor size was 20 to 50 mm, but satisfied all other criteria (this group is called low-malignant T2 tumors). The tumors were matched pair-wise according to tumor type as well as year of surgery, tumor size, and age as far as possible. None of the patients had received adjuvant systemic therapy. The average follow up for nonmetastasizing patients was 12.3 years. The study was approved by the regional ethical committee of Southern Denmark. Gene-expression analysis. RNA was purified from tumor biopsies with Trizol followed by further purification and DNase treatment on RNeasy micro columns (Qiagen). For gene expression analysis, a 29K oligonucleotide chip with duplicate measurement of each gene was used as previously described (15). The sequence of the 70 original oligonucleotides reported by van't Veer (1) was downloaded from Rosetta Inpharmatics website,7 and identical oligonucleotides were spotted on the chips. The same approach could not be used for the other gene sets because of different length of targets. Labeled aRNA was prepared from 500 ng RNA using the Ambion Amino Allyl MessageAmpTM aRNA kit as previously described (15).
Data analysis. Identification of spot locations and quantification was done using arrayWoRx software (Applied Precision). Raw intensity data were normalized using the variance stabilization normalization procedure (16), implemented in the R package vsn. The prediction of outcome and development of a 32-gene classifier (HUMAC32) is described elsewhere (13). Briefly, classification was done by leaving one matched pair of tumors out and selecting genes with nearest shrunken centroids method in R package pamr (17). The selected genes were submitted to support vector machines (SVM; R package e1071) to build a hyper plane to separate the training set (58 samples) with maximal margin and to use to classify new samples in the testing set (two samples; ref. 18). Cross-validation was done for all 30 pairs, a scheme called 30 classifier scheme. The optimal classifier (HUMAC32) was developed by applying nearest shrunken centroids method to the entire data set. All mentioned R packages are implemented in the R-based Bioconductor package.8
Genes from previously reported prognostic gene sets were annotated to the 29K chip by gene bank accession numbers, Unigene ID, or gene symbol. The different gene sets were submitted to SVM and classification done by leaving one matched pair of tumors out at a time and cross-validation as described above, except that the gene set was fixed. The output from SVM is probability of poor outcome for each tumor. A probability cutoff of 0.5 was applied to all classifications to obtain comparable results. Furthermore, mean values of probability of poor outcome were plotted for the different classifiers to compare the separation of samples.
Concordance between two given classifiers were calculated as the fraction of samples classified as identical. The functional analyses of gene sets were generated through the use of expression analysis systematic explorer (19). The program uses Fishers exact test to calculate the probability of randomly selecting the number of genes present in a gene set with a certain function from the gene list represented on the used chip. The P value is subsequently corrected for multiple testing by the Bonferroni method. The overlap between gene sets was investigated with Microsoft Access using the annotation from the 29K chip.
| Results |
|---|
|
|
|---|
|
Finally, two gene sets, developed for reverse transcription-PCR (RT-PCR) analysis of candidate genes, were tested. A 21-gene set, of which 5 house keeping genes were omitted and one gene was not represented on the chip, resulted in clear sample separation and 73% accuracy (Fig. 1H). The three-gene set had low prognostic power in the present data set (48% accuracy; Fig. 1I). Likewise an intrinsic gene set from a Norwegian study, done with a cDNA chip from Stanford University (20), was tested and resulted in accuracy comparable with a random distribution of the samples (data not shown).
To examine concordance of classification of different gene sets, the classification results from Fig. 1 are summarized in Table 1 . Furthermore, instead of evaluating the performance of the classifiers with the somewhat arbitrary probability cutoff limit of 0.5, inspection of mean probability of poor outcome may be more informative (Fig. 2 ).
|
|
|
|
| Discussion |
|---|
|
|
|---|
In the present study, eight other prognostic gene sets are examined. We have not been able to use the original algorithms because the data have not been available or the platforms have been so different that this would not be meaningful. For this reason, the SVM procedure with leave-one-pair-out cross-validation seems reasonable for testing these sets. The 30-classifier scheme had higher accuracy and better separation of the samples than the other tested gene sets (Figs. 1 and 2). This may be explained by lower power for the other classifiers, developed for higher risk cancers, in the present cohort of lower risk tumors. This is supported by inspection of the two different risk groups in the present tumor set: T1 and T2 tumors. With the 30-classifier scheme, 31% of the misclassified tumors are T1 tumors (4 of 13), whereas it is 43% (6 of 14) for 76-gene set, 50% (10 of 20) for 64-gene set, 50% (9 of 18) for CSR, and 40% (6 of 15) for the 70-gene set, respectively (Fig. 1). The 21-gene set had comparable performance with the 30-classifier scheme (27%, 4 of 15); however, this gene set was also developed on lymph node–negative and estrogen receptor–positive tumors. A crucial effect of receptor status on gene expression profile has been shown (4). When inspecting separation of samples, in terms of probability of poor outcome, the same tendency that T1 tumors are separated better than T2 tumors with 21-gene and HUMAC32 gene sets compared with the other gene sets is also observed (Fig. 2). The pair-wise matching of samples according to currently used clinical and pathologic prognostic markers may also explain the lower performance of the classifiers developed in cohorts. Cohort classifiers are trained to track the classic prognostic markers because these are strongly biased in outcome groups, whereas the present design enable examination of prognostic value independent from these markers. Several other factors like different platforms, incomplete gene representation on the present chip, different diagnostic procedures, and different sampling procedures may explain the higher accuracy of the 30-classifier scheme. However, this would most likely not change the misclassification rate and separation of samples between T1 and T2 tumors in the present data. The lack of prognostic power in the 17-gene profile reported by Ramaswamy may be explained by the fact that this profile was developed on cancers originating from six different tissues possibly reflecting other metastatic mechanisms. The low performance of the three-gene and 17-gene set, measured by low accuracy of classification, was supported by low concordance to the other gene set corresponding to a random distribution of the samples (Table 1).
Taking the low overlap between the gene sets in consideration (Table 2), the relatively high agreement of classification between the gene sets (Table 1) may indicate that the same underlying biological pathways are represented in the gene sets. Indeed there are several overlaps in the functions of the classifier genes, with cell cycle and cell proliferation being the predominant gene ontology categories (Table 3). This is supported by Fan et al. (21), who recently reported high concordance of an intrinsic gene set with four other signatures. Unlike that study, we have done comparisons of all gene sets mutually. Furthermore, we have developed classification algorithms with SVM instead of adopting classification methods from original studies, reducing the bias of different used platforms.
The MLF1P gene present in five of the seven gene sets deduced from genome-wide expression profiling might be a potent prognostic marker. High expression of this gene has been shown in glioblastoma cell lines compared with normal brain tissue in rat (22), and it binds to MLF1, a negative regulator of cell cycle functioning upstream p53 (23). However, genes overlapping in several classifiers may not be adequate classifier genes, because 14 genes overlapping in three or more classifiers had lower performance than HUMAC32 in the present data. This may indicate that HUMAC32 contain additional genes specific for metastasizing from low-malignant cancer.
The clinical relevance of the current study is to prevent metastasis among low-risk patients. Furthermore, the results show a better classification of the patients with low-malignant T2 tumors compared with the classic methods indicating a potential to reduce the considerable overtreatment in this group. Although only 26 low-risk tumors were included in this study, it is actually the largest of its kind. The cohort study by van de Vijver et al. (2) only included 22 patients of whom four developed distant metastasis. In the study by Wang et al. (4), 14 low-risk patients, including three who developed metastasis, were included in the testing sample set. The low-risk patients used for development and validation of the 21-gene profile were treated with tamoxifen, biasing the prognostic performance of the signature (11). The present study shows considerable prognostic power of 76-gene and 70-gene classifiers among low-risk patients, thereby validating their performance in this patient group.
The pair-wise matching of the present tumors corrects for several factors which could bias the results. This include diagnostic procedures that have changed over time e.g., implementation of more sensitive techniques for detection of lymph node metastasis and receptor status. The sampling methods that might have changed slightly and the storage time at –80°C may have effect on the expression profiles, but these biases are also minimized with the sample matching. Bias from technical variation during purification and microarray procedure of the samples has also been minimized by performing simultaneous processing of the matched pairs.
| Conclusion |
|---|
|
|
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 1/31/07; revised 5/ 9/07; accepted 6/12/07.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
B. Haibe-Kains, C. Desmedt, C. Sotiriou, and G. Bontempi A comparative study of survival models for breast cancer prognostication based on microarray data: does a single gene beat them all? Bioinformatics, October 1, 2008; 24(19): 2200 - 2208. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |