Purpose: Colorectal cancer prognosis is currently predicted from pathologic staging, providing limited discrimination for Dukes stage B and C disease. Additional markers for outcome are required to help guide therapy selection for individual patients.
Experimental Design: A multisite single-platform microarray study was done on 553 colorectal cancers. Gene expression changes were identified between stage A and D tumors (three training sets) and assessed as a prognosis signature in stage B and C tumors (independent test and external validation sets).
Results: One hundred twenty-eight genes showed reproducible expression changes between three sets of stage A and D cancers. Using consistent genes, stage B and C cancers clustered into two groups resembling early-stage and metastatic tumors. A Prediction Analysis of Microarray algorithm was developed to classify individual intermediate-stage cancers into stage A–like/good prognosis or stage D–like/poor prognosis types. For stage B patients, the treatment adjusted hazard ratio for 6-year recurrence in individuals with stage D–like cancers was 10.3 (95% confidence interval, 1.3-80.0; P = 0.011). For stage C patients, the adjusted hazard ratio was 2.9 (95% confidence interval, 1.1-7.6; P = 0.016). Similar results were obtained for an external set of stage B and C patients. The prognosis signature was enriched for downregulated immune response genes and upregulated cell signaling and extracellular matrix genes. Accordingly, sparse tumor infiltration with mononuclear chronic inflammatory cells was associated with poor outcome in independent patients.
Conclusions: Metastasis-associated gene expression changes can be used to refine traditional outcome prediction, providing a rational approach for tailoring treatments to subsets of patients. (Clin Cancer Res 2009;15(24):7642–51)
- colorectal cancer
- gene expression
- outcome prediction
Molecular markers are required to refine prediction of recurrence risk for colorectal cancer to help guide the selection of adjuvant therapies for individual patients. This international single-platform microarray study shows that metastasis-associated gene expression changes, identified across multiple sets of stage A and D cancers, can be used to improve outcome prediction for patients with Dukes stage B or C disease. Microarray data for training and test cases were produced at multiple sites, indicating good interinstitutional reproducibility required for clinical application. Our results improve our understanding of colorectal cancer progression, identifying putative signatures of downregulated immune response genes and upregulated cell signaling and extracellular matrix genes. Accordingly, low density of mononuclear chronic inflammatory cells within tumors was shown to be associated with poor prognosis in independent patients. Our candidate genes provide a good starting point for future study and potential targets for therapy.
Colorectal cancer is often detected at a stage when complete resection of the primary cancer is possible, yet 40% to 50% of patients who undergo potentially curative surgery alone relapse and die of metastatic disease (1). Patient risk for recurrence is currently largely predicted from the extent of spread of the primary tumor, and this is the major determinant of further clinical management. Although most patients with Dukes stage C (lymph-node positive) cancer receive a combination of 5-fluorouracil and oxaliplatin, adjuvant treatment is offered to only a subset of Dukes stage B (localized disease) patients presenting with specific high-risk clinical features, including tumor perforation or invasion of adjacent organs (2). This approach is clearly suboptimal, resulting in undertreatment of ∼20% of stage B patients who will recur. Similarly, current adjuvant treatment is clearly ineffective in many stage C patients, with a recurrence rate of ∼40% (3, 4), highlighting the need for treatment with more aggressive or newly emerging targeted therapies. There is an urgent need for biomarkers to refine traditional prediction of recurrence risk to enable better use of existing treatment options and the optimal development of novel individualized therapies.
Several studies have used microarray analysis on primary tumor specimens to identify gene expression signatures predictive of colorectal cancer prognosis (5–9). The general approach for signature discovery has been analysis of patients selected for good and poor outcomes (training set), followed by assessment of the signature in additional cases (test set). However, the performance and general applicability of published classifiers has been challenging to determine. Division of patients into training and test sets has often resulted in small sample sizes (5, 6, 8), and several studies did not formally assess a defined classifier but rather the validity of candidate prognostic genes using cross-validation procedures (6, 8, 9). Furthermore, signature discovery based on outcome is generally confounded in patients undergoing adjuvant treatment (most stage C patients) because it is difficult to distinguish markers of prognosis from markers of therapy response (7, 9).
Gene expression patterns have been shown to broadly differ between metastatic and nonmetastatic colorectal cancers, implying that the acquisition of metastatic potential by the primary tumor is accompanied by specific changes in endogenous transcription and/or changes in the tumor microenvironment (10–13). This suggests an alternative approach to prognosis signature discovery, whereby expression differences between the extremes of stages of cancer (early stage/stage A versus metastatic/stage D) could be used to predict recurrence in patients with intermediate stages of disease. Advantages of this approach are that tumor stage-based discovery does not require follow-up data and that the confounding effect of previous therapy can be avoided by selecting patients who have not undergone such treatment.
In this international multisite study, we evaluated this discovery strategy using data on colorectal cancers from 553 patients analyzed using a common microarray platform. Reproducible gene expression differences were identified between three training sets of stage A and D cancers, with the latter being represented by primary and distant lesions. The feasibility of using consistent expression changes for classification of intermediate-stage cancers into groups resembling early-stage and metastatic lesions was assessed using unsupervised clustering on two sets of stage B and two sets of stage C tumors. A prognostic algorithm was developed to permit classification of individual test cancers into early-stage “good prognosis” or metastatic “poor prognosis” types, a requirement for clinical application. The prognostic value of this single-sample classifier was determined for stage B and C patients with long-term follow-up data. An external data set of 99 stage B and C patients produced on an earlier version of our microarray platform was used for additional validation. To improve our understanding of the changes associated with metastatic progression in colorectal cancer, classifier genes were analyzed for functional category enrichment; a putative immune response signature was validated by histologic analysis of tumor infiltrating mononuclear chronic inflammatory cells on 155 stage B and 166 stage C patients enrolled in the Vioxx in Colorectal Cancer Therapy: Definition of Optimal Regime (VICTOR) clinical trial, a phase III, randomized, placebo-controlled study on rofecoxib (14).
Materials and Methods
Patients and gene expression microarray analysis
Fresh-frozen tumor specimens from 293 consecutive colorectal cancer patients were retrieved from the tissue banks of the Royal Melbourne Hospital, Western Hospital, and Peter MacCallum Cancer Center in Australia, and the H. Lee Moffitt Cancer Center in the United States; individuals who had received preoperative chemotherapy and/or radiotherapy or for whom tumor-derived total RNA was inadequate for microarray analysis (RNA integrity number [RIN] < 6) were excluded. All patients gave informed consent, and this study was approved by the medical ethics committees of all sites. Patient median age at diagnosis was 67 y (range, 26-92 y). All specimens were derived from primary carcinomas and were snap frozen in liquid nitrogen immediately after surgery for storage at −80°C. Cases were composed of 44 stage A, 95 stage B, 93 stage C, and 61 stage D cancers; 252 were localized to the colon and 40 to the rectum, with one case missing this information. Twenty-two of 94 patients who had stage B disease and 64 of 91 patients who had stage C disease had received standard adjuvant chemotherapy (either single agent 5-fluouracil/capecitabine or 5-fluouracil and oxaliplatin) or postoperative concurrent chemoradiotherapy (50.4 Gy in 28 fractions with concurrent 5-fluorouracil), according to hospital protocols. All patients were assessed annually. For stage B and C patients, follow-up and additional clinical data, including patient gender and TNM staging, were collected by Biogrid Australia8 for Australian patients and the Moffitt Cancer Center Tumor Registry for U.S. patients. The median duration of follow-up was 47.8 mo (range, 0.9-118.6 mo) for the 140 patients without recurrence, and 19.1 mo (range, 1.6-93.7 mo) for the 48 patients with local or distant recurrence. The median follow-up for all 188 patients was 37.2 mo (range, 0.9-118.6 mo).
Total RNA was extracted using Trizol reagent (Invitrogen) from colorectal cancer samples containing >60% tumor cells. All samples included showed good integrity of 18S and 28S ribosomal bands (RIN > 6) using a 2100 Bioanalyzer (Agilent Technologies). Total RNA was labeled and hybridized to HG-U133Plus2.0 GeneChip arrays (Affymetrix), according to the manufacturer's instructions. The microarray data on a subset of 174 tumors have been published previously (National Center for Biotechnology Information Gene Expression Omnibus; GSE5206 and GSE13067).
In addition, published gene expression data were retrieved for 42 stage A colorectal cancers, 83 stage B, 73 stage C, and 62 stage D colorectal cancers analyzed as part of the Expression Project for Oncology (expO)9 using HG-U133Plus2.0 GeneChip arrays (Affymetrix; Supplementary Table S1). Of the 62 stage D colorectal cancers, 32 were primary cancer and 30 were metastectomy specimens. None of the primary cancer patients had received preoperative therapy, but 17 metastectomy specimens were from patients who had received adjuvant chemotherapy treatment before resection. Data processing and analysis were done using the statistical software package R (15) and appropriate Bioconductor packages (16).
Identification of metastasis-associated gene expression changes
Consistent gene expression changes were identified between 44 stage A and 61 stage D colorectal cancers from this study and 42 stage A and 62 stage D colorectal cancers from expO. For the expO data set, separate comparisons were done for primary stage D cancers and distant metastases to identify gene expression maintained during metastatic spread. For each cohort, MAS5.0-calculated signal intensities were normalized using the quantile normalization procedure implemented in robust multiarray analysis (17, 18), and the normalized data were log transformed (base 2). Probe sets that were not expressed or probe sets that showed a low variability across samples were excluded. Expression values were required to be above the median of all expression measurements in at least 25% of samples, and the interquartile range across the samples on the log scale was required to be at least 0.5. Genes mapping to sex chromosomes were excluded because cases were not matched by gender. A total of 6716 gene probes passed these filtering steps in all three sample sets.
Differentially expressed genes were identified using Significance Analysis of Microarrays with a Wilcoxon rank-sum test and a false discovery rate of 10% (19). Separate lists were generated for genes significantly upregulated or downregulated in stage A colorectal cancers as compared with stage D colorectal cancers for each of the three comparisons. For differentially expressed genes identified repeatedly between cohorts, consistency of upregulation or downregulation was assessed using Pearson's χ2 test.
For the 95 stage B and 93 stage C colorectal cancers from this study and the 83 stage B and 73 stage C colorectal cancers from expO, expression values of the identified metastasis-associated genes were mean and sample centered, followed by divisive hierarchical clustering using pair distances calculated as 1 minus the Spearman r as distance metric. Differences in median gene expression values were calculated for the samples within the two main branches of the resulting dendrogram. Relative upregulation or downregulation of gene expression between these two groups was assessed for consistency with upregulation or downregulation observed between early-stage and metastatic cancers using Pearson's χ2 test.
Prediction Analysis of Microarray classifier development and application
Based on metastasis-associated genes, a Prediction Analysis of Microarrays (20) nearest shrunken centroid classifier was developed for separation of all primary stage A (n = 86) and stage D (n = 93) cancers (reference set). Microarray data were quantile normalized, followed by 10-fold cross-validation for increasing values of centroid shrinkage, designed to progressively eliminate noisy genes. Misclassification errors were calculated from this cross-validation procedure. Using the optimized Prediction Analysis of Microarray classifier, 95 stage B and 93 stage C colorectal cancers were classified into stage A–like good prognosis and stage D–like poor prognosis types. MAS5.0-calculated signal intensities of stage B or C cancers were normalized against the reference set on a single-sample basis.
Functional category enrichment analysis
Functional category enrichment analysis was done using the Functional Annotation Clustering tool on the Database for Annotation, Visualization and Integrated Discovery.10 Metastasis-associated genes were classified according to their annotated role in biological process, molecular function, and cellular component from Gene Ontology.11 Category enrichment was tested against all human genes. P values were adjusted using the Benjamini-Hochberg False Discovery Rate multiple testing correction.
Analysis of tumor infiltration with mononuclear chronic inflammatory cells
Hematoxylin and eosin–stained tissue sections of formalin-fixed paraffin-embedded colorectal cancer specimens were retrieved for 155 stage B and 166 stage C patients enrolled in the VICTOR clinical trial (14). The average density of mononuclear chronic inflammatory cells (comprising lymphocytes, plasma cells, and macrophages) was scored within tumor areas composed of >60% of neoplastic cells by two anatomic pathologists (M. Christie and S. Prakash); areas of adenoma, ulceration, and necrosis were excluded from the analysis. Mononuclear chronic inflammatory cell density was assessed at ×40 magnification and classified into low and moderate/high by each observer.
Associations between predicted stage A– and D–like cancers and clinical characteristics were separately assessed for stage B and C patients using Fisher's exact test for categorical variables and the Welch two-sample t test for continuous variables. For the outcome analysis, 6-y recurrence was the primary endpoint. Disease-free survival was defined as the time of surgery to the first confirmed relapse. Censoring was done when a patient died or was alive without recurrence at last contact. Cox proportional hazards models were used to estimate survival distributions and hazard ratios and included the gene expression classifier, age at diagnosis, number of lymph nodes examined, N stage, and adjuvant treatment. All statistical analyses were two sided and considered significant if P < 0.05.
Expression changes between early-stage and metastatic colorectal cancers
Reproducible gene expression changes between early-stage and metastatic colorectal cancers were identified using 44 stage A and 61 stage D tumors from our laboratories, and 42 stage A and 62 stage D tumors from expO. Separate comparisons were done for specimens derived from primary stage D cancers and distant metastases to identify changes maintained during metastatic spread. For each cohort, separate lists were generated for genes significantly upregulated or downregulated in metastatic cancers and for repeatedly identified genes consistency of upregulation or downregulation was assessed (Table 1). All pairwise comparisons of metastasis-associated changes were significant (P < 0.001, χ2 test), with >96% of changes being consistent in all cases. The level of consistency was high irrespective of whether the comparisons involved only primary metastatic cancers or primary stage D cancers and distant metastases. A total of 128 genes (163 probe sets; Supplementary Table S2) showed reproducible upregulation (71 genes) or downregulation (57 genes) in metastatic cancers as compared with early-stage cancers across all three cohorts. Notably, two of the three comparisons solely involved primary cancers from patients who had not received preoperative therapy, thus excluding a confounding influence of treatment on classifier selection.
Clustering of intermediate-stage colorectal cancers using metastasis-associated genes
Feasibility of using our set of 128 metastasis-associated genes for classification of stage B and C colorectal cancers into groups resembling early-stage and metastatic lesions was assessed using unsupervised clustering on four independent sample sets: 95 stage B and 93 stage C colorectal cancers from this study, and 83 stage B and 73 stage C colorectal cancers from expO (Fig. 1). For all four sets of tumors, the relative differences in median gene expression between the two main resulting clusters mirrored those identified between early-stage and metastatic lesions (Supplementary Table S3); >97% of changes were consistent for each comparison (P < 0.001; χ2 test).
Prognosis classification of intermediate-stage colorectal cancers
To permit classification of individual test cancers into early-stage/good prognosis or metastatic/poor prognosis types, a requirement for clinical application, a Prediction Analysis of Microarray algorithm was developed using all 179 primary stage A and D cancers from this study and expO as a reference set (Supplementary Fig. S1). For each test cancer, microarray data were normalized against this reference set followed by sample classification into a stage A– or D–like type. Prior (expected) 6-year recurrence probabilities were set as those presently observed for stage B and C patients (20% and 40%, respectively; ref. 21).
Most test stage B (82 of 95; 86.3%) and stage C (77 of 93; 82.8%) colorectal cancers were classified into stage A– and D–like types with a >90% prediction probability (Supplementary Fig. S2). Thirty-seven (45.1%) of 82) of stage B and 29 (37.7%) of 77 of stage C cancers showed a stage A–like signature at this cutoff. For both groups of patients, class predictions were not associated with age at diagnosis, gender, tumor T stage, location, number of lymph nodes examined, and adjuvant treatment (Table 2). However, stage C patients with stage D–like tumors tended to present with a higher node status (37.5% with N2 status; 18 of 48) than those with stage A–like tumors (13.8% with N2 status; 4 of 29; P = 0.037; Fisher's exact test), consistent with the anticipated classification by metastatic potential. The 13 stage B and 16 stage C patients who could not be confidently classified had clinical features similar to those patients who could be classified with confidence.
Metastasis-associated changes predict poor prognosis
Probabilities of disease-free survival were independently calculated for the 82 stage B and 77 stage C patients with “confident” class predictions (Supplementary Fig. S3). As anticipated, individuals with stage D–like cancers showed a poorer prognosis than individuals with stage A–like cancers in both cases. The estimated hazard ratio for recurrence was 10.6 [95% confidence interval (95% CI), 1.3-82.0; P = 0.024; Wald test] for stage B, and 2.8 (95% CI, 1.1-7.5; P = 0.035; Wald test) for stage C patients over a 6-year follow-up period. Similar results were obtained when the analysis was adjusted for adjuvant treatment (stage B hazard ratio, 10.3; 95% CI, 1.3-80.0; P = 0.011; stage C hazard ratio, 2.9; 95% CI, 1.1-7.6; P = 0.016).
Comparison of the expression classifier and pathologic staging
To assess the prognostic value of our 128-gene classifier, we compared it against pathologic staging in stage B and C patients. For this comparison, expression-based classification was done using the same prior recurrence probability of 30% for all patients. Individuals showed similar differences in outcomes when classified based on pathologic staging or the expression classifier (Fig. 2A-B). The estimated hazard ratio for recurrence was 2.8 for stage C patients as compared with stage B patients (95% CI, 1.5-5.4; P = 0.002; Wald test) and 4.0 for patients with stage D–like cancers as compared with patients with stage A–like cancers (95% CI, 1.7-8.9; P = 0.001; Wald test).
Combining independent pathologic staging and expression-based classification improved prediction of recurrence risk with broad separation into three groups of patients with different outcomes (Fig. 2C): (a) A good prognosis group consisting of stage B patients with stage A–like cancers showing a 6-year disease-free survival probability of 96.5% (95% CI, 90.1-100.0%); (b an intermediate prognosis group comprising stage B patients with stage D–like cancers and stage C patients with stage A–like cancers showing probabilities of 73.0% (95% CI, 60.4-88.2%) and 77.1% (95% CI, 62.2-95.7%), respectively; and (c) a poor prognosis group of stage C patients with stage D–like cancers showing a probability of 47.9% (95% CI, 34.7-66.1%).
Univariate and multivariate analyses
The prognostic value of our classifier was compared with clinical variables, including patient age at diagnosis, the number of lymph nodes examined, N stage, and adjuvant treatment using univariate Cox proportional hazards regression analysis. T stage was not included because most stage B (78 of 82) and stage C (65 of 77) cancers were of stage T3 (Table 2). For both stage B and C patients with confident class predictions (n = 82 and n = 77, respectively), our 128-gene classifier was the strongest predictor of outcome (Table 3). In stage B patients, adjuvant treatment was the only other clinical variable reaching statistical significance (P = 0.042; Wald test). Stage B patients receiving adjuvant treatment showed a higher risk for 6-year recurrence as compared with those who did not (hazard ratio, 3.23; 95% CI, 1.04-10.00), consistent with such therapy being offered specifically to selected high-risk individuals. In stage C patients, only N stage reached statistical significance besides the classifier (P = 0.044; Wald test), with N2 patients showing an increased risk for 6-year recurrence as compared to N1 patients (hazard ratio, 2.18; 95% CI, 1.02-4.66).
Assessment of whether the classifier was an independent factor predicting colorectal cancer prognosis was done against all clinical variables (Table 3). The classifier was an independent predictor of 6-year disease-free survival for stage B patients (P = 0.043; Wald test) and showed a corresponding trend for stage C patients (P = 0.080; Wald test). The decrease in the prognostic value of our classifier in the multivariate analysis for stage C patients was probably largely due to the observed positive association between class prediction and node status (Table 2). Accordingly, when analysis of stage C patients was limited to individuals with N1 disease, our classifier was an independent predictor of outcome (P = 0.047; Wald test).
Classifier validation on an external data set
We identified an independent Danish colon cancer data set comprising 33 Dukes stage B and 66 stage C patients. Because these data were produced on HG-U133A rather than HG-U133plus2.0 GeneChip arrays (Affymetrix), our classifier was reduced from 163 to 113 available probe sets. Using this restricted gene signature, unsupervised clustering was found to divide these patients into the two expected groups showing median gene expression differences corresponding to those between early-stage and metastatic cancers (Fig. 3); again, >99% of changes were consistent (P < 0.001; χ2 test; details not shown). Single-sample Prediction Analysis of Microarray classification against our reference set of primary stage A and D cancers successfully divided patients into stage A–like/good prognosis and stage D–like/poor prognosis types based on overall survival (P = 0.041; Wald test). When analyzed by stage, the 113-gene classifier subdivided Dukes stage B and C patients into good and poor prognosis groups.
Assessment of prognostic value for individual classifier genes
To assess whether specific classifier genes were of particular prognostic value in our stage B and C patients, we did Cox proportional hazards regression analysis for individual probe sets adjusted for adjuvant treatment (Supplementary Table S4). As anticipated in stage B and C patients, hazard ratios for probe sets upregulated in metastatic cancers tended to be >1 [81 (91.0%) of 89 and 82 (92.1%) of 89, respectively], whereas hazard ratios for probe sets downregulated in metastatic cancers tended to be <1 [68 of 74 (91.9%) and 60 of 74 (81.1%), respectively]. However, individual hazard ratios were statistically significant at an unadjusted P of <0.05 for only a small proportion of probe sets in either stage B (28.2%; 46 of 163) or stage C (14.7%, 24 of 163) patients; only 10 probe sets, representing the VAT1, AKAP12, DCBLD2, WWTR1, ZNF532, IGJ, CTA-246H3.1, L06101, IGL@, and IGLJ3 genes, were significant for both stages. For consistent genes, hazard ratios ranged from 0.59 to 0.84 for downregulated and 1.53 to 2.66 for upregulated probe sets, lower than for the combined 128-gene classifier. When adjusting Ps for multiple testing, expression of only one probe set, representing DCBLD2, remained significantly associated with outcome in stage B patients.
Functional clusters for classifier genes
For our 128-gene classifier, functional category enrichment analysis identified three significant Gene Ontology annotation clusters, immune response, extracellular matrix interaction, and developmental process (Supplementary Table S5). When the signature was separated into genes showing upregulation or downregulation in metastatic cancers as compared with early-stage cancers, the extracellular matrix interaction and developmental process clusters were found to specifically represent upregulated genes. The extracellular matrix signature was further evident for a separate analysis of Kyoto Encyclopedia of Genes and Genomes pathways (22), showing significant overrepresentation of genes for the extracellular matrix–receptor interaction (04512hsa) and focal adhesion (04510hsa) pathways. In contrast, the immune response cluster specifically represented downregulated genes.
Validation of the immune response signature
To validate the observed association between downregulation of putative immune response genes and poor colorectal cancer prognosis, we assessed whether tumor infiltration with mononuclear chronic inflammatory cells predicted outcomes in 155 stage B and 166 stage C patients enrolled in the VICTOR clinical trial (14). Scores of average inflammatory cell density were concordant between two independent observers for 77% of cancers (κ statistical, 0.53; 95% CI, 0.33-0.63; ref. 23). Excluding samples with discordant scores, low density of mononuclear chronic inflammatory cells was significantly associated with poor recurrence-free survival (hazard ratio, 2.00; 95% CI, 1.17-3.41; P = 0.011; Wald test) over a 6-year follow-up period when adjusted for patient age at diagnosis, tumor stage, adjuvant therapy, and rofecoxib treatment.
Molecular markers that predict colorectal cancer recurrence are required to improve the selection of therapies for individual patients. We hypothesized that gene expression differences between early-stage and metastatic cancers might predict recurrence for patients with intermediate stages of disease. Using three cohorts of early-stage and metastatic colorectal cancers from multiple sites, we identified 128 genes reproducibly associated with metastatic spread. The feasibility of using this signature for prediction of metastatic potential in stage B and C cancers was shown using unsupervised clustering of five independent cohorts; all separated into two groups showing expression profiles corresponding to those observed for early-stage and metastatic lesions. An algorithm for single-sample classification was developed, which permitted scoring of individual test cases against a defined reference set of primary stage A and D cancers. As anticipated, intermediate-stage patients with stage D–like cancers showed a significantly worse prognosis than those with stage A–like cancers.
Controversy exists about the benefit and use of adjuvant chemotherapy in stage B patients (24, 25). Our 128-gene classifier seemed to be a strong independent predictor of outcome in these patients. The difference in prognosis observed for expression-based classification in our patients was clinically significant, with an adjusted hazard ratio for recurrence in individuals with stage D–like cancers of 8.5 (95% CI, 1.1-68.6) for a 6-year follow-up period. These results would justify a modification in the approach to adjuvant therapy. Low-risk patients could be reassured and not offered adjuvant treatment, whereas the most effective adjuvant therapy should be considered for high-risk patients.
Stage C patients are routinely offered adjuvant chemotherapy, but despite treatment, ∼40% of individuals relapse (3). Our classifier again identified subgroups with different outcomes: Firstly, it broadly distinguished between patients with different node status, with ∼37% of stage D–like and ∼14% of stage A–like tumors presenting with N2 disease. Secondly, for patients with N1 disease, our classifier was found to be an independent prognostic factor in multivariate analysis with an adjusted hazard ratio for recurrence in individuals with stage D–like cancers of 3.6 (95% CI, 1.02-13.2). Similar to N2 patients, N1 patients with stage D–like cancers showed particularly poor outcomes, indicating a need for treatment with more aggressive regimes or with newly emerging targeted therapies.
Subsets of our 128 classifier genes seemed to represent three putative biological functions, as indicated by functional category enrichment analysis: immune response, extracellular matrix interaction, and cell signaling. Notably, genes suggested to belong to the same functional category showed consistent changes in gene expression between early-stage and metastatic lesions. Putative immune response genes, composed of multiple immunoglobulin (IGHA1, IGHG1, IGHM, IGH@, IGJ, IGKC, IGK@, IGL@, IGLJ3), chemokine (CCL20, CCL28, CXCL13), and proteasome genes (PSMB10, PSMB8, PSMB9), were downregulated in metastatic/poor prognosis cancers, suggesting a role of the immune response in modulating colorectal cancer outcome. This potential association was supported by our systematic assessment of tumor infiltration with mononuclear chronic inflammatory cells in a large independent cohort of stage B and C patients enrolled in the VICTOR clinical trial. Consistent with our data, general enrichment of immune response genes has been reported for gene expression classifiers constructed by two previous microarray studies (5, 9), and poor survival from colorectal cancer has been associated with reduced numbers of tumor-infiltrating lymphocytes (26–30).
In contrast, genes upregulated in metastatic cancers seemed to represent two broad functional categories, extracellular matrix interaction and cell signaling. Evidence for the former group was particularly strong, with multiple members identified from the extracellular matrix–receptor interaction Kyoto Encyclopedia of Genes and Genomes pathways, including integrins (ITGB1, ITGB5), collagen (COL5A1), fibronectin 1 (FN1), and secreted phosphoprotein 1 (SPP1). Notably, upregulation of SPP1 has been noted and confirmed by previous microarray studies and shown to be associated with tumor progression, invasion, and metastasis in multiple solid cancers, including colorectal cancer (31–33). Upregulated cell signaling genes seemed to represent a number of pathways believed to drive cancer progression and metastasis, including the TGF-β pathway through TGFB3 and latent TGF-β binding protein 3 (LTBP3), the VEGF pathway through neuropilin 2 (NRP2) and fms-like tyrosine kinase 1 (FLT1), and the Wnt pathway through dapper homolog 1 (DACT1). Further validation and study on these metastasis-associate genes should inform our understanding of disease progression.
Previous studies have identified gene expression signatures for colorectal cancer prognosis by analyzing patients selected for good and poor outcomes, followed by signature validation in additional cases (5–9). Our approach was markedly different from this strategy in that gene expression differences between early-stage and metastatic colorectal cancers were evaluated as prognostic markers for patients with intermediate stages of disease. A number of previous studies had limited sample sizes (5, 6, 8) and solely focused on stage B or stage C patients (5, 6, 8). The analyses by Eschrich et al. (7) and Lin et al. (9) did comprise various stages of colorectal cancer but did not adjust for adjuvant treatment, an important modifier of outcome. Importantly, several studies did not formally assess the performance of a single defined classifier in independent test samples but rather assessed the validity of a set of candidate prognostic genes using cross-validation procedures (6, 8, 9). Our analysis of microarray data on 553 colorectal cancers represents the largest multisite study to date in which a single defined prognostic classifier was developed and subsequently evaluated in independent sets of stage B and stage C patients. Furthermore, classifier validation was formally carried out using a prediction algorithm designed for single-sample classification.
Our classifier showed limited direct overlap with previously reported prognosis signatures (5–9). Overlapping genes included an ADAM metallopeptidase (ADAMTS12; ref. 5), Kruppel-like factor 4 (KLF4; ref. 6), SPP1 (7), discoidin (DCBLD2; ref. 7), DACT1 (7), chloride intracellular channel 4 (CLIC4; ref. 7), and PDZ-binding kinase (PBK; ref. 9). This may be due to multiple potential interstudy differences, including sample processing, microarray platforms, patient selection, and the analytic tools used for signature discovery. Prospective classifier validation and, ultimately, clinical application will require adherence to standardized analysis protocols.
In summary, our results show that metastasis-associated gene expression changes can be used to refine traditional outcome prediction, providing a rational approach for tailoring treatments to subsets of patients. The gene expression changes accompanying the acquisition of metastatic potential by the primary tumor seem to reflect both changes in endogenous transcription and changes in the tumor microenvironment such as immune cells. Genes overexpressed in high-risk cancers are potential targets for the development of new anticancer drugs to prevent the development of metastatic disease.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
We thank the Victorian Cancer BioBank and Biogrid Australia for the provision of specimens and clinical data.
Grant support: National Cancer Institute grant R01-CA112215-01A2 (T.J. Yeatman); the Jeannik M. Littlefield-AACR Grant in Metastatic Colon Cancer Research, the Commonwealth Scientific and Industrial Research Organisation Preventative Health Flagship, and the Hilton Ludwig Cancer Metastasis Initiative (L. Lipton, P. Gibbs, and O.M. Sieber); and the Victorian Government through a Victorian Cancer Agency Clinical Researcher Fellowship (L. Lipton).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
- Received June 4, 2009.
- Revision received September 3, 2009.
- Accepted September 3, 2009.