
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Imaging, Diagnosis, Prognosis |
Authors' Affiliations: 1 Thoracic Surgery Oncology Laboratory and Division of Thoracic Surgery and 2 Department of Pathology, Brigham and Women's Hospital, Harvard Medical School; 3 Department of Physics, University of Massachusetts; 4 Hematology/Oncology Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts; and 5 Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Cambridge, Massachusetts
Requests for reprints: Gavin J. Gordon or Raphael Bueno, Division of Thoracic Surgery, Brigham and Women's Hospital, 75 Francis Street, Boston, MA 02115. Phone: 617-732-8148; Fax: 617-582-6171; E-mail: ggordon{at}partners.org or rbueno{at}partners.org.
| Abstract |
|---|
|
|
|---|
Experimental Design: Gene expression data using high-density oligonucleotide microarrays (
22,000 genes) were obtained for a new cohort of human MPM tumors from patients undergoing similar treatments (n = 39). The relative expression levels for specific genes were also determined using real-time quantitative reverse transcription-PCR. We also used a subset of these tumors associated with widely divergent patient survival (n = 23) as a training set to identify new treatment-specific candidate prognostic molecular markers and gene ratiobased prognostic tests. The predictive nature of these newly discovered markers and gene ratiobased prognostic tests were then examined in an independent group of tumors (n = 52) using microarray data and quantitative reverse transcription-PCR.
Results: Previously described MPM prognostic genes and gene ratiobased prognostic tests predicted clinical outcome in 39 independent MPM tumor specimens in a statistically significant manner. Newly discovered treatment-specific prognostic genes and gene ratiobased prognostic tests were highly accurate and statistically significant when examined in an independent group of 52 tumors from patients undergoing similar treatment.
Conclusions: The data support the use of gene ratios in translating gene expression data into easily reproducible, statistically validated clinical tests for the prediction of outcome in MPM.
Key Words: expression profiling prognosis gene ratios
Whereas most MPM patients succumb within 1 or 2 years, some survive as long as 10 years. Certain features, such as epithelial histology, negative lymph nodes, and negative resection margins, have been proposed as markers for good prognosis. However, determination of nearly all of these requires major surgical exploration making accurate pretreatment staging or prognostication essentially impossible.
We have previously profiled using microarrays 31 MPM tissues (10) and designed gene expression ratiobased prognostic tests whose predictive value was statistically confirmed in an independent set of 29 patients (11). Pass et al. (12) also profiled 21 MPM patients and created a 27-gene neural network classifier to predict clinical outcome with near statistical significance in an independent smaller patient population. In the current study, we profiled using microarrays MPM tumors from 39 additional patients that received similar trimodality treatment to validate our proposed prognostic test (11). We also use these data to identify new treatment-specific candidate prognostic molecular markers and design new ratio-based prognostic tests.
| Materials and Methods |
|---|
|
|
|---|
Isolation of RNA and microarray experiments. Sample preparation and hybridization to microarrays was done as described in the Affymetrix Expression Analysis Technical Manual. Total RNA (7 µg) was prepared from whole tumor blocks using Trizol Reagent (Invitrogen Life Technologies, Carlsbad, CA). Hybridization experiments were scanned for artifacts and gene expression levels (i.e., Affymetrix "Signal") were generated and scaled for each microarray to a "target intensity" of 100 using Affymetrix Microarray Suite v.5.0.
Real-time quantitative reverse transcription-PCR. Real-time quantitative reverse transcription-PCR (RT-PCR) was done using a SYBR-Green fluorometric-based detection system (Applied Biosystems, Foster City, CA, www.appliedbiosystems.com, see technical bulletin no. 4310251). Total RNA (2 µg) was isolated and reverse-transcribed (11). All RT-PCR primers were used at a final concentration of 800 nmol/L in the reaction mixture. Primer sequences for L6, GDIA1, CTHBP, and KIAA0977 are published (11). Other primers synthesized (Invitrogen Life Technologies) were as follows (forward and reverse): CD9 (5'-CCACTATGCGTTGAACTGCT-3' and 5'-CACGGTGAAGGTTTCGAGTA-3'), DLG5 (5'-ATCTGTCATCGACCCACTGA-3' and 5'-GGGTCTTCTTGTTGGCATCT-3'), KIAA1199 (5'-TTAAGGCAGCACACTTGGAG-3' and 5'-TCATAACCTCCCCTTTCGTC-3'), and THBD (5'-ATGTTTTGCAACCAGACTGC-3' and 5'-GATGTCCGTGCAGATGAAAC-3'). PCR was done using a Stratagene MX 3000P device with appropriate controls and expression levels were obtained using the comparative CT equation (Applied Biosystems) with slight modifications (11).
Validation of prognostic genes in independent cohorts. The predictive nature of previously described prognostic genes (11) was examined in the 39 tumors profiled in the current study using the Partitioning Around Medoids (PAM) clustering algorithm (13). We also examined a previously described gene expression ratiobased prognostic test (11) in these 39 samples using quantitative RT-PCR. For gene expression level ratiobased prognosis, data from multiple gene expression ratios were combined by calculating the geometric mean as previously described (11).
Identification of new prognostic genes. We identified new treatment-specific candidate prognostic molecular markers and created an expression level ratiobased outcome predictor model similar to previous studies (11, 14, 15) using a subset of the 39 samples profiled in this study as training set. We searched the Affymetrix U133A microarray to identify all genes with expression levels that differed significantly (P
0.01) and by at least 2-fold between good-outcome (n = 13, survival >17 months) and poor-outcome (n = 10 survival <6 months) training set tumors to identify new treatment-specific prognostic markers. We further refined this gene list by requiring that their mean expression levels be >300 in at least one of the two sample sets, similar to previous studies (11, 14). Significance Analysis of Microarrays software (16) was used to estimate the false discovery rate.
New treatment-specific prognostic genes discovered were examined using PAM in additional independent MPM samples (n = 26) originating from patients with similar treatment (i.e., extrapleural pneumonectomy) for which previously published microarray and linked clinical data was available (10, 11). Gene expression ratiobased prognostic tests similarly discovered using these 39 samples were examined using quantitative RT-PCR in 23 of 26 samples previously used for microarrays (10) for which cDNA was still available, plus an additional 27 samples (11) and 2 samples never used previously that also originated from patients who underwent extrapleural pneumonectomy (i.e., the "test set," n = 52 total).
The gene expression levels of 22 of 27 previously published MPM prognostic genes discovered by other investigators (12) were also analyzed in the 39 tumors profiled in the current study using hierarchical clustering, similar to the original study, and PAM (two genes were not represented on the Affymetrix U133A platform, and an additional three genes were removed from consideration because they were not called "Present" for a majority of samples): probe sets 35792_at, 34303_at, 38749_at, 39020_at, and 38650_at.
Survival studies and statistical analysis. Kaplan-Meier curves were used to estimate patient survival among groups of samples defined by predictions made using microarray data (using PAM and/or hierarchical clustering) and quantitative RT-PCR data (using optimal ratio-based prognostic tests). The log-rank test was used to statistically assess differences among multiple survival curves in univariate survival analysis. The statistical significance of survival differences observed using multiple candidate prognostic genes was evaluated by comparing to those obtained using a random selection of genes. Specifically, for each analysis, we constructed 10,000 data sets consisting of the expression levels of an equal number of randomly chosen genes in the same sample cohort. In each iteration, we did clustering with PAM and survival analysis to determine the likelihood of obtaining a P value (log-rank test) equal to or lower than the original P value observed.
A Cox proportional hazards regression model was used for multivariate analysis to identify coefficients that best described the effect of a given variable on censored survival data. Dichotomous variables included histologic subtype (epithelial or nonepithelial), lymph node status (positive or negative), surgical resection margins (positive or negative), and predictions made using ratio-based prognostic tests (good or poor outcome). (Coding for these analyses can be found in Supplemental Tables S2 and S3, as referenced in Results.) Individual P values reported for multivariate analysis were calculated by considering the Wald statistics of the individual parameters in the combined model. Individual hazard ratios (HR) and 95% confidence intervals are expressed as the exponentiated coefficient values and are interpretable as multiplicative effects on the hazard. The likelihood-ratio test was used to test the null hypothesis that all of the coefficients are zero. All calculations and statistical comparisons were generated using S-PLUS with a significance cutoff of P < 0.05 unless otherwise stated.
| Results |
|---|
|
|
|---|
0.1%; i.e., 14 of 10,000 iterations; see Materials and Methods).
|
|
|
24.8 months) to "good outcome" and 80% (8 of 10) of the short-term survivors (i.e.,
6.8 months) to "poor outcome." Finally, we added the remaining 19 samples to the analysis to comprehensively examine predictions made by the three-ratio test in all 39 samples. We found that this test significantly (P = 0.037) predicted patient outcome associated with these 39 samples (Fig. 1C). The estimated median survival (33 months) of the good-outcome subset was over 2.5-fold higher than the estimated median survival of the poor-outcome subset (12 months). We have previously shown that positive resection margins, mixed histology, and positive lymph nodes are statistically significant negative prognostic markers in a large (n = 183) cohort of similarly treated MPM patients (17). Therefore, we used multivariate survival analysis to examine whether our results using expression ratios were independent of the above prognostic variables. The results of fitting a Cox proportional hazards regression model to these survival data are shown in Table 1 (see Supplemental Table S2 for codes). None of the clinical variables achieves individual statistical significance likely due to a lack of statistical power resulting from a small sample size, unbalanced patient distribution pertaining to stage, and the fact that this cohort was not optimized to detect these differences (i.e., neither histologic subtype nor lymph node status were statistically significant prognostic variables in univariate survival analysis). However, the result of a likelihood-ratio test (P = 0.036) suggests that at least one of the regression coefficients is not zero. Examination of the point estimate HRs revealed that the combined score of the expression ratio test (HR, 2.00) was higher than that for both histologic subtype (HR, 1.49) and positive resection margins (HR, 1.23) and was similar to that for lymph node status (HR, 2.34).
|
Using the eight prognostic genes, we determined whether expression ratios could accurately classify the 23 samples used to train the model. We calculated a total of 16 possible expression ratios per sample by dividing the expression value of each of the four genes expressed at relatively higher levels in good-outcome samples (i.e., EST, CD9, DLG5, and C3) by the expression value of each of the four genes expressed at relatively higher levels in poor-outcome samples (i.e., CD24, KIAA1199, CD24, and THBD). (Note that CD24 is listed twice because this gene is represented by multiple Affymetrix probe sets and as such serves as an internal control.) Samples with ratio values >1 were predicted to be "good outcome" and those with ratio values <1 were predicted to be "poor outcome." The overall accuracy of each of the 16 expression ratios varied widely (average = 71%, range 57-83%). To incorporate the predictive power of multiple prognostic genes (i.e., ratios), we calculated the combined score (i.e., geometric mean; see Materials and Methods) for all 560 possible three-ratio combinations and similar to previous studies (11, 14). We found that we could easily identify training samples with accuracy that met or exceeded that of any of the gene pair ratios when used alone. The three most accurate three-ratio combinations were all similarly accurate (87%) in identifying training set samples. These three tests used a total of four gene pair ratios in multiple combinations: CD9/KIAA1199, CD9/THBD, DLG5/KIAA1199, and DLG5/THBD. The combined score of this four-ratio test resulted in the same classification accuracy as any of the three-ratio tests, so we decided to additionally examine this four-ratio test going forward.
Validation of prognostic genes. To eliminate the need for internal cross-validation, we examined using multiple techniques the predictive nature of candidate prognostic markers from above in a separate cohort (n = 52; i.e., the test set; Table 4). (Microarray data was available for 26 of these samples; see Materials and Methods). The histologic distribution and the estimated median patient survival (9 months; Fig. 1D) of the test set of samples was representative of those expected for MPM patients. The median survival of this cohort (9 months; Fig. 1D) is moderately shorter (but not statistically significantly) than that for the more current cohort (14 months; Fig. 1A) likely due to slightly different treatments and/or effects-of-time trends. None of the genes from Table 3 were identified in our initial discovery of optimal candidate MPM prognostic genes (11), likely due to excessive variability and/or the slightly different treatments between cohorts, although only CD9 was not represented on the previous expression profiling platform. However, the average expression levels for all remaining genes in previous samples were substantially higher in the predicted group with the exception of DLG5 and the EST, for which average expression levels in both good and poor-outcome groups were nearly identical. Although fold change differences in the average expression levels for these genes were relatively high between good and poor-outcome samples (11), there was substantial variability among individual samples as reflected in the originally calculated P value (gene, average fold-change difference, P value): C3, 13-fold higher in good-outcome samples, P = 0.079; CD24, 24-fold higher in poor-outcome samples, P = 0.030; KIAA1199, 2-fold higher in poor-outcome samples, P = 0.32; THBD, 2-fold higher in poor-outcome samples, P = 0.25.
|
0.2% (i.e., 17 of 10,000 iterations; see Materials and Methods).
Next, we examined the prognostic call of the four-ratio test in 26 of the 52 samples associated with widely divergent patient survival as above (i.e., survival greater than and less than the 75th and 25th percentiles, respectively). We used quantitative RT-PCR to obtain the relative gene expression levels of the genes comprising the four most accurate gene pair ratios from above (CD9/KIAA1199, CD9/THBD, DLG5/KIAA1199, and DLG5/THBD) then calculated a combined score of the four-ratio combination. Samples with combined scores >1 and <1 were predicted to be associated with good and poor outcome, respectively. In these 26 samples, the four-ratio test was 69% (18 of 26) accurate and called 92% (12 of 13) of the long-term survivors (
18 months) "good outcome" and 46% (6 of 13) of the short-term survivors (
5 months) "poor outcome." Finally, we analyzed all 52 samples of the test set using quantitative RT-PCR and found that the estimated survival associated with "good" and "poor" prognosis subjects identified using expression ratios were significantly different (P = 0.0096; Fig. 1F). The estimated median survival (12 months) of the good-outcome subset was over 2-fold higher than the estimated median survival of the poor-outcome subset (5 months).
As before, we did multivariate survival analysis to examine ratio-based predictions in the context of previously described prognostic variables (17). The results of fitting a Cox proportional hazards regression model to censored survival data are shown in Table 5 (see Supplemental Table S3 for codes). No individual prognostic variable was found to be statistically significant in the combined model likely for reasons similar to those stated previously. Nevertheless, the result of a likelihood-ratio test (P = 0.041) suggests that at least one of the regression coefficients is not zero. Importantly, the HR point estimate for the combined score of the expression ratio test (HR, 2.06) was higher than that for both histologic subtype (HR, 1.38) and lymph node status (HR, 1.21) and was moderately lower than that for positive resection margins (HR, 3.29).
|
| Discussion |
|---|
|
|
|---|
There was no overlap in previously discovered MPM prognostic genes (11) and those discovered in the current study despite the fact that both sets of genes were discovered in surgical patients undergoing similar therapies at the same institution and that each set was statistically predictive of survival when examined in the other cohort. There are at least two likely explanations for this apparent discrepancy: (a) experimental and biological variability and (b) inherent differences in patient treatment between both cohorts. Sources of variability include general reproducibility issues pertaining to all microarray studies (18), different profiling platforms used to analyze each patient cohort, different numbers of samples used in the discovery of each set of prognostic genes, and inherent genetic differences likely present in tumors from patients with different stage disease. Nevertheless, it was encouraging that prognostic genes discovered in the current study had predicted expression patterns in the other cohort and average expression levels in tumors from patients associated with extreme survival differences that differed with near-statistical significance.
The median survival of the good-outcome subset from the new test (CD9/KIAA1199, CD9/THBD, DLG5/KIAA1199, and DLG5/THBD) was identical (12 months) to that for the poor-outcome subset of the initial test (KIAA0977/GDIA1, L6/CTHBP, and L6/GDIA1). The cause of this observation likely relates to the fact that the overall median survival increased by
50% in the second cohort (e.g., compare Fig. 1A and D). Other explanations may relate to the relatively small sample size of each group and/or the fact that treatments were slightly different for each set of patients. Nevertheless, the fact that previously described prognostic genes remain valid in the current MPM patients supports the predictive nature of the selected genes and the gene ratio method in general. Furthermore, the HR point estimates for both ratio-based predictions are very similar in multivariate survival analysis (Tables 1 and 5). This would suggest that the gene-ratio approach is able to distinguish between patient prognoses with approximately the same relative risk, although the actual survival within each prognostic group may vary.
Our first test (11) identified 75% (15 of 20) of the patients with extreme survival differences. It is desirable to identify a higher percentage of poor prognosis patients because these patients are unlikely to benefit from upfront surgery. Our new test identified a similar number of patients with extreme survival differences (18 of 26, 69%), but interestingly identified a greater number (12 of 13, 92%) of long-term survivors (
18 months) but fewer (6 of 13, 46%) of the short-term survivors (
5 months) compared with the first test. The cause of this observation was not immediately clear, but also likely reflects the fact that both patient cohorts were of different stage and received slightly different treatments. The fact that both tests worked suggests that they are more in tune either with the degree of tumor aggressiveness or with the cytoreductive nature of the treatment than with the specific chemotherapy given at surgery. Importantly, when examined in an independent set of patients with similar stage who received identical therapy, the accuracy (88%) of the original prognostic test (KIAA0977/GDIA1, L6/CTHBP, and L6/GDIA1) was much higher (11).
A more complex 27-gene neural network classifier proposed by Pass et al. (12) was 76% accurate when validated using hierarchical clustering in an independent data set, including patients with widely divergent survival. Our analyses using either our previous (11) or newly developed ratio-based tests attained a similar accuracy (69% and 75%) despite the utilization of fewer genes and a simpler approach. It is also noteworthy that both of our ratio-based tests from the current and original studies resulted in statistically significant differences in survival when samples that originated from tumors associated with a range of patient survival were analyzed (Fig. 1D and H). It is unknown whether the 27-gene classifier of Pass et al. (12) would have significantly predicted survival under these circumstances in the original analysis.
An important result of these studies is the identification of mechanisms potentially involved in malignant transformation in MPM. Several of the prognostic genes discovered in the current study have previously documented roles in cancer. The CD9 gene codes for a member of the transmembrane-4 superfamily (also known as the tetraspanin family) whose proteins mediate signal transduction events regulating cell development, activation, growth, and motility. Two other members of this gene family have also been shown to be associated with good prognosis in MPM: the gene encoding the L6 tumor antigen (a.k.a. TM4SF1) previously reported by our laboratory (11) and the plasmolipin gene previously reported by Pass et al. (12). In other studies, low expression of CD9 is thought to contribute to a more aggressive (metastatic) phenotype in small cell lung cancer (19), gastric cancer (20), and breast cancer (21). These observations are generally consistent with our finding of CD9 expressed at significantly higher levels in tumors from patients with relatively good prognosis. Thrombomodulin (THBD) is a type I membrane receptor that has been suggested as a potential tumor diagnostic marker because it is expressed by up to 75% of MPMs (22, 23) and 83% of cardiac myxomas (24). Expression of the CD24 cell surface antigen has been observed in multiple malignancies and seems to function as a ligand for the adhesion molecule P-selectin. Recently, CD24 has been shown to be an independent and statistically significantly predictive indicator in multivariate survival analysis in nonsmall cell lung cancer (25) and ovarian cancer (26). In these studies, high levels of CD24 were associated with shorter survival times, consistent with our results in MPM.
In this study, we identify and validate MPM prognostic genes both in a general context and as part of a predictive test. We again show the utility of the gene ratio technique (10, 11, 14, 15) in MPM by designing and testing multiple clinically relevant prognostic tests. Prognostication using typical bioinformatics tools (e.g., hierarchical clustering) is not easily amenable to the analysis of a single patient at a time and without reference to an additional group of patients whose gene expression data was similarly acquired. Furthermore, many of these bioinformatics techniques are inherently sensitive to sources of variability, such as the number of genes used in the model, the data acquisition platform, and inherent biological variability. Consequently, these classification techniques are not likely to quickly impact patient clinical management. Because ratio-based tests offer several advantages over traditional bioinformatics tools (10), it is likely that they will prove useful in future clinical scenarios as an adjunct to traditional staging techniques.
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Expression profiling raw data and supplemental information are available at http://www.generatios.com, under "Publications".
Received 10/26/04; revised 2/23/05; accepted 3/18/05.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. J. Gordon, L. A. Deters, M. D. Nitz, B. C. Lieberman, B. Y. Yeap, and R. Bueno Differential diagnosis of solitary lung nodules with gene expression ratios J. Thorac. Cardiovasc. Surg., September 1, 2006; 132(3): 621 - 627. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Lopez-Rios, S. Chuai, R. Flores, S. Shimizu, T. Ohno, K. Wakahara, P. B. Illei, S. Hussain, L. Krug, M. F. Zakowski, et al. Global Gene Expression Profiling of Pleural Mesotheliomas: Overexpression of Aurora Kinases and P16/CDKN2A Deletion as Prognostic Factors and Critical Evaluation of Microarray-Based Prognostic Prediction. Cancer Res., March 15, 2006; 66(6): 2970 - 2979. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Cell Growth & Differentiation |