Abstract
Purpose: Patients presenting with locally advanced rectal cancer currently receive preoperative radiotherapy with or without chemotherapy. Although pathologic complete response is achieved for approximately 10% to 30% of patients, a proportion of patients derive no benefit from this therapy while being exposed to toxic side effects of treatment. Therefore, there is a strong need to identify patients who are unlikely to benefit from neoadjuvant therapy to help direct them toward alternate and ultimately more successful treatment options.
Experimental Design: In this study, we obtained expression profiles from pretreatment biopsies for 51 rectal cancer patients. All patients underwent preoperative chemoradiotherapy, followed by resection of the tumor 6 to 8 weeks posttreatment. Gene expression and response to treatment were correlated, and a supervised learning algorithm was used to generate an original predictive classifier and validate previously published classifiers.
Results: Novel predictive classifiers based on Mandard's tumor regression grade, metabolic response, TNM (tumor node metastasis) downstaging, and normal tissue expression profiles were generated. Because there were only 7 patients who had minimal treatment response (>80% residual tumor), expression profiles were used to predict good tumor response and outcome. These classifiers peaked at 82% sensitivity and 89% specificity; however, classifiers with the highest sensitivity had poor specificity, and vice versa. Validation of predictive classifiers from previously published reports was attempted using this cohort; however, sensitivity and specificity ranged from 21% to 70%.
Conclusions: These results show that the clinical utility of microarrays in predictive medicine is not yet within reach for rectal cancer and alternatives to microarrays should be considered for predictive studies in rectal adenocarcinoma. Clin Cancer Res; 17(9); 3039–47. ©2011 AACR.
Translational Relevance
Patients presenting with locally advanced rectal cancer currently receive a “one size fits all” approach to treatment. This comprises preoperative radiotherapy given with or without chemotherapy prior to surgical resection of the tumor. However, a broad spectrum of response to therapy is evident. This highlights the need to identify patients in terms of their predicted response so that their treatment can be tailored accordingly. Here we used microarray profiling of 51 pretreatment patient biopsies to build predictive models. We also tested 3 previously published predictive gene sets. Predictive gene lists were generated for this cohort; however, these had limited sensitivity and specificity. In addition, all 3 of the previously published predictive gene lists did not retain their predictive power in our cohort. Our data indicate that current microarray predictors are not robust enough for clinical utility in rectal cancer patients at this stage and alternative approaches for establishing personalized medicine in rectal cancer should be considered.
Introduction
Neoadjuvant chemoradiotherapy (CRT) has become the standard of care for patients presenting with locally advanced rectal cancer in many centers. This multimodality therapy has been shown to improve disease-free survival (1). Preoperative CRT shrinks tumor bulk to improve resectability and maintain local control. Neoadjuvant CRT also has the potential to combat micrometastatic disease. Reports indicate that pathologic complete response (pCR) is a good predictor of improved long-term outcome (2, 3). Unfortunately, pCR is achieved only for 10% to 30% of rectal cancer patients (4, 5), meaning that although many patients respond well to CRT, a similar proportion fail to respond or experience disease progression. These patients derive no survival benefit yet are exposed to toxic side effects of treatment. Therefore, there is a strong need to identify patients unlikely to benefit from preoperative CRT to help direct them toward alternate and possibly more successful treatment options.
Although single-marker approaches may be suited to targeted therapies, for example, KRAS mutational screening for response to anti-EGFR (epidermal growth factor receptor) therapy (6, 7), tumor response to CRT is complex and unlikely to be attributed to 1 factor alone. Numerous markers have been identified as predictors of response to CRT in rectal adenocarcinomas (ADC) including clinicopathologic features (8) p53 status (9) and thymidylate synthase mutational status (10–12). However, none of these have proven to be clinically useful and have generated conflicting results (13–15).
Transcriptional profiling of tumors is promising in terms of predictive medicine. In fact 2 commercially available predictive platforms, MammaPrint and OncoTypeDX, developed from microarray profiling are now used in breast cancer prognostics (16, 17). This has encouraged research in predictive genomics for other cancer types, with investigations predicting response to therapy in rectal cancer patients generating classifiers capable of 71% to 87% correct prediction (18–20). In this study, we profiled pretreatment biopsies from 51 rectal cancer patients. Gene expression and response to treatment were correlated using a variety of supervised learning algorithms in an effort to generate an original predictive classifier and validate previously identified response classifiers.
Methods
Patients, samples, and treatment
Patients with histologically proven invasive rectal cancer who had primary, locally advanced tumors without distant metastases (T2N1+M0, T3NxM0, or T4NxM0) being recommended for preoperative CRT between 2005 and 2009 were entered into this study following the provision of informed written consent as approved by the Peter MacCallum Cancer Centre, the Royal Melbourne Hospital, and the Austin Hospital Ethics Committees. Research biopsies (2–3 mm3) were collected during the initial diagnostic endoscopy and stored at −20°C in RNAlater solution (Ambion Inc.). Biopsies were divided into half, with one of the pieces undergoing independent histopathologic review and the other prepared for RNA extraction.
All patients received treatment considered the standard of care for patients with locally advanced rectal cancer, comprising a total radiation dose of 50 Gy applied in 25 fractions over 5 weeks, concurrent with daily administration of 225 mg/m2 5-fluorouracil (5-FU) continuously for 5 weeks. This was followed by en bloc resection of tumor, with its associated vascular and lymphatic drainage 6 to 8 weeks after completion of CRT.
Assessment and classification response
Approximately 6 to 8 weeks post-CRT, all patients underwent surgery. Response to CRT was assessed by histologic examination of the resected specimen and scored according to Mandard's tumor regression grade (TRG; ref. 21) as previously adapted for colorectal tumors (22). The percent residual tumor was also estimated. There was 90% concordance in TRG classification by 2 independent pathologists. Patients with less than 10% residual tumor were classified as responders and those with more than 10% residual tumor as nonresponders.
Metabolic response was assessed using positron emission tomographic (PET) scanning pre- and post-CRT. Patients were staged on a dedicated PET/CT scanner (Discovery; GE Healthcare) 1 hour after injection of 300 to 400 MBq of 2-deoxy-2-[18F]fluoro-d-glucose (18F-FDG). Bladders were routinely catheterized, and images were acquired from the neck to upper thigh. Qualitative analysis of PET metabolic response was determined from side-by-side visual inspection of PET images from the pre and posttreatment scans. The changes in the 18F-FDG pattern of a tumor were scored as follows: complete metabolic response, no identifiable activity in all previously defined sites of 18F-FDG activity or where 18F-FDG uptake was indistinguishable from or less than any diffuse bowel activity immediately adjacent to the original site of uptake and within the radiation treatment volume; partial metabolic response, intensity of 18F-FDG uptake was reduced compared with pretreatment scan but residual uptake was still of higher intensity than adjacent bowel; stable (or progressive) metabolic disease, intensity of 18F-FDG uptake was unchanged (or increased) after treatment. In addition, all patients underwent staging investigations, which included CT scan of abdomen/pelvis, whole-body PET/CT scan, MRI of pelvis, and/or transrectal ultrasonography, before and after CRT. Those patients whose TNM (tumor node metastasis) level downstaged following CRT were classified as responders, whereas those whose TNM level remained stable or increased were classified as nonresponders.
RNA extraction
RNA was extracted from biopsies containing more than 75% tumor by phenol/chloroform extraction (TRIzol; Invitrogen) prior to further purification by column chromatography (RNeasy Mini kit; Qiagen). RNA integrity was then assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies).
Microarrays
Gene expression analysis was done using the Affymetrix GeneChip Human Genome U133 Plus 2.0 Array Platform containing probes representing 39,000 genes. Preparation of labeled and fragmented aRNA targets, hybridization, and scanning were carried out according to the manufacturer's protocol (Affymetrix). Briefly, 100 ng of total RNA for each sample was processed using the GeneChip 3′ IVT Express Kit. RNA was reverse transcribed and then converted to double-stranded cDNA prior to biotin labeling during in vitro transcription. Fifteen micrograms of labeled aRNA was then fragmented, and quality control was carried out using the Agilent Bioanalyzer. Fragmented aRNA was then hybridized on GeneChip Human Genome U133 Plus 2.0 Arrays for 16 hours at 45°C. Arrays were then washed and stained using the GeneChip Hybridization, Wash, and Stain Kit on the GeneChip Fluidics Station 450. Chips were then scanned using the Affymetrix GeneChip Scanner 3000. Of the 54 samples processed, all arrays passed quality control with the exception of 3, which were excluded from the analysis.
Class prediction analysis
The package Affy (23) from Bioconductor (24) was used to load the data for each experiment into the statistical computer program R (25). The chips were then assessed for their quality by using the affyPLM (26–28) package. This package fits a probe level model to the data and can help identify spatial artifacts and abnormal intensity distributions.
The data were normalized and background corrected using the robust multiarray average (29) expression. The Affy and Limma (30) packages were then used to model the data and generate genetic signatures. The model used to calculate the differential expression was constructed with a batch adjustment. After calculating the differential expression between the responders and the nonresponders, gene signatures were generated by selecting the genes that met a set of selection criteria, such as fold change and P value cutoffs. The selection criteria tested were the top-ranked n genes (n varied from 10 to 1,000) and genes that had a fold change greater than 2 and a P value less than 0.05. Because no genes were found to be significant (>0.05) after applying a false discovery rate adjustment (31), the P values were not adjusted for multiple testing.
To assess the performance of each of the signatures leave one out cross-validation (LOOCV) was done. LOOCV involves iteratively leaving out 1 sample (the test sample) and generating a gene signature by using the remaining samples (the training set). This gene signature is then used to predict the test sample, and the results are compared with the actual classification.
To predict the classification of the test set, 2 methods were compared: support vector machines (SVM) from the e1071 package (32) and diagonal linear discriminant analysis (DLDA) from the Supclust package (33). The kernel for the SVM was set to linear and the cost was tuned using values in the range of 0.1 to 10,000.
To validate the gene signatures of the various authors (18–20), analyses were done using the same algorithm as each of the publications, either the k-nearest neighbors algorithm (19) or DLDA (18, 20). If the author had reported the Probeset ID along with the gene symbols these were collated, otherwise the gene symbols were converted to Probeset IDs by using the Affymetrix netAffx (https://www.affymetrix.com/analysis/netaffx/index.affx). For each training set in the LOOCV analysis, the data were modeled using the log2 expression of these probesets and then the test was predicted.
Results
Patient response and survival
A total of 54 patients met all criteria for inclusion in this study. Three had to be excluded, as their microarrays failed to pass our strict quality control standards. Clinical data for the remaining 51 patients are summarized in Table 1. In this cohort, 14 of 51 patients (27%) had good tumor response (<10% residual tumor) whereas 7 patients (14%) achieved pCR according to Mandard's TRG (Table 1). Using TRG classification, 7 of the 51 patients had minimal response to treatment (>80% residual tumor). According to TNM staging pre- and post-CRT, 24 patients did not respond or progressed during treatment. Fourteen patients (27%) had complete metabolic response post-CRT. Although there was partial concordance, tumor response rate to CRT varied with different modes of classification. With a median follow-up of 33 months, only 4 of 51 patients had recurrent disease and 5 patients died from unrelated causes. With such low number of cancer events, recurrence/disease-free survival could not be used as a measure of response. Examination of the clinicopathologic features recorded for this study did not reveal any associations with response outcome (Table 2).
Summary of patient clinical data
Breakdown of clinical data
Generating a new predictive classifier
To generate a predictive classifier from this cohort, we first carried out an analysis by using TRG and percentage residual tumor to classify patients with responders defined as those patients whose resected tumor had less than 10% residual tumor and nonresponders having more than 10% residual tumor. This classifier, using LOOCV, peaked in predictive accuracy with 50% sensitivity and 59% specificity (Table 3). The sensitivity of a particular test reflects the proportion of responders correctly identified, whereas specificity is a measure of how accurate the test is at predicting nonresponders. Because this classifier was not a robust predictor of response, attempts were also made at separating patients into the extremes of response. For this analysis, responders were patients with less than 10% residual tumor whereas nonresponders had more than 50% residual tumor. Although this classifier could correctly predict responders 82% of the time, the specificity of this test was 30%. Both DLDA and SVM analyses provided similar results; however, for simplicity, here we present the results from SVM analysis.
Predictive performance for the various analyses run aiming to predict response to neoadjuvant CRT
Response to CRT is complex and can be measured in a number of ways. In our next analysis, metabolic response was used to stratify patients as responders and nonresponders. The most robust classifier had 89% specificity; however, it was incapable of detecting responders. TNM downstaging was then used to group patients as responders or nonresponders. The predictive power from this analysis peaked with sensitivity and specificity at 72% and 52%, respectively (Table 3).
Expression profiling in normal tissue has previously been shown to predict prognosis (34, 35); hence, gene expression patterns in normal tissue were also used to generate a predictive classifier for response to CRT. TRG and percentage residual tumor were used to define patients as responders or nonresponders, but the most robust classifier from this analysis was capable of only 58% sensitivity and 63% specificity.
Validation of existing predictive classifiers
The 3 previously published predictors (18–20) were tested on the array data generated for the 51 patients in this cohort. The gene list identified by Ghadimi and colleagues (18) was only capable of correctly classifying responders at a rate of 21% and the specificity for the analysis for correct classification of a nonresponder was 37%. The classifier identified by Kim and colleagues (19) was capable of 50% sensitivity and 70% specificity in our cohort. The gene list from the most recent study (20), which used the same array platform as the one in the present study, was also evaluated; however, this classifier using DLDA could predict responders correctly only 33% of the time whereas specificity was 30%.
These 3 studies built their classifiers from the expression profiles of 23 (18), 31 (19), and 43 (20) rectal biopsies. Although the studies from Kim and colleagues (19) and Rimkus and colleagues (20) had a similar ratio of nonresponders versus responders (2:1), using TRG to define response, the present study and the original study from Ghadimi and colleagues (18) had close to equal numbers of responders and nonresponders. It should be noted that although each study used TRG to define response to treatment, there were slight variations in the precise classification of patients. Ghadimi and colleagues (18) classified responders according to a different pathologic grading system (36, 37) along with T-level downsizing. Patients with complete or almost complete regression (grade 3 or 4) were defined as responders and those with all other grades were classified as nonresponders. Kim and colleagues (19) used the TRG system from Dworak and colleagues (36), with responders being patients who achieved pCR (no residual tumor cells) and all other patients classified as partial responders. Rimkus and colleagues (20) classified responders according to TRG classification of Becker and colleagues (38). Patients whose samples contained less than 10% residual tumor were classified as responders; the remaining patients were classified as partial/nonresponders.
Pathway analysis
The previously published predictive classifiers could not be validated and, on closer inspection, it became evident that there were no shared genes among the 3 gene lists. To determine whether the predictive classifiers from the other rectal cancer studies involved a common molecular pathway, each gene list was entered into Ingenuity Pathway Analysis Software. Numerous pathways, including MYC, ERK, retinoic acid, and NF-κB signaling, were highlighted as significant for each gene list; however, the only molecular networks common to all 3 gene lists were the TNF signaling pathway and the β-estradiol signaling network. The TNF signaling network identified using the Kim and colleagues (19) gene set is shown in Figure 1. This pathway was also identified using the predictive genes from Ghadimi and colleagues (18) and Rimkus and colleagues (20).
Ingenuity Pathway Analysis of a predictive gene set derived from response data for rectal cancer patients highlights a role for the TNF signaling network.
Discussion
Individualized treatment planning requires improved tumor staging and more accurate assessment and prediction of response to therapy. Preoperative CRT has been shown to improve outcome for patients with locally advanced rectal cancer, especially if pCR has been achieved. Prediction of good tumor response can provide useful prognostic information but would not alter treatment selection. Identification of resistant tumors to planned therapy would have a much greater clinical impact on patient management. For treatment to be altered, tumors must be shown to have negligible response. In this study, we believe more than 80% residual tumor is a reasonable definition of a nonresponder. However, the majority of rectal cancer patients responded well to CRT, with only 7 of 51 patients deemed as nonresponders. An attempt at constructing a predictive gene classifier based on the highly skewed distribution between nonresponders and responders was not feasible. Hence, our effort was redirected to the establishment of a predictive algorithm that can prognosticate based on tumor response to CRT. This focus on patients with a very good response to therapy and separating those patients from patients with an intermediate or unfavorable response is a limitation, as stratification to this degree is not yet the current clinical practice. However, although it is acknowledged that a pCR does not necessarily reduce the potential value of surgery, there are clinical reports of patients who have a complete response being managed without surgery and thus identification of patients who have had a pCR has a high potential value.
This study could not generate a strong predictor of response to CRT in rectal cancer patients. The use of TNM downstaging for TRG and metabolic response to classify patients as responders or nonresponders generated predictors with a broad range of sensitivity and specificity (Table 3). Sensitivity peaked for the classifier generated from the extremes of response. This classifier was capable of correctly predicting responders 82% of the time. In contrast, the specificity for this classifier was only 30%. This means that the gene list was not a good predictor, and while it screened for and detected the majority of responders, it misclassified many nonresponders as being responders (high false-positive rate). The classifier generated from the metabolic response analysis had the greatest specificity at 89%; however, while the test was seemingly accurate in terms of predicting nonresponders, it had such low sensitivity that it failed to predict any patients as being responders. For a test to be truly predictive, both sensitivity and specificity must be high.
There are several potential reasons to explain why microarray profiling could not predict response in this cohort. The main problem in studies such as this one is the classification of response. Categorizing patients as responders or nonresponders has a significant impact on the genes which are identified as predictive. We found that while most response measures were in agreement, there were multiple cases in which TNM downstaging, TRG, and metabolic response were conflicting (Supplementary Table S1). To address this issue, we separated out each of the response measures and ran the analyses separately; however, this approach did not yield a robust predictor.
We were unable to show that pretherapeutic expression profiling can be used with confidence to correctly calculate response to CRT in our cohort of rectal cancer patients. Testing the previously published gene lists (18–20) on our cohort showed a similarly low level of predictive power. This suggests that gene expression profiling is not a reliable indicator of patient response in rectal ADC.
Perhaps changes in transcription for this cohort of rectal ADC patients are either extremely minor or quite unique and variable between patients, making it difficult to detect a discrete difference between responders and nonresponders and to extract predictive classifiers by using current bioinformatic techniques. In addition, obviously, RNA levels do not necessarily correlate perfectly with protein levels and biological activity. Another factor to consider is that while the tumor is obviously the target of the therapy, other tissues or factors could be involved in certain aspects of response. Sensitivity to CRT may not be due to transcription in the tumor itself but due to another physiologic factor such as drug uptake and metabolism by the liver, diet (39), overall fitness, or immunosurveillance (40).
We attempted to validate 3 previously published predictive gene lists (18–20). All genes from the published classifiers were represented on the Affymetrix U133 Plus 2.0 Arrays used in the present study. In our cohort, these lists had limited sensitivity and specificity. Although the classifier from the study by Kim and colleagues (19) had the highest specificity at 70%, it achieved only 50% sensitivity. This raises the important issue of reproducibility on independent data sets. To be clinically useful, a predictive classifier must be able to accurately predict response in independent patient cohorts. This finding highlights that gene lists identified from predictive microarray studies are unstable and may not retain predictive power in an independent set of samples. The results from microarray studies are often poorly reproducible and gene lists must be rigorously validated.
Slight variations in study design may also alter the predictive power of these classifiers. For example, deviations in the samples in terms of percentage tumor differ between each study. Although samples with more than 75% tumor were profiled here, the 3 previous studies used biopsies with a range of different tumor cell content (18–20), which may impact reproducibility. In addition, the predictive study from Kim and colleagues (19) generated a predictor for multiple CRT regimens with combinations of 5-FU and leucovorin, capecitabine with irinotecan, or capecitabine alone used.
Predictive microarray studies are often overly optimistic with their results and conclude that the classifier may have clinical utility and requires further validation. Here we show that validation of 3 previously published gene lists is not possible in our set of patients. This finding is not in isolation; in fact, with the exception of the MammaPrint and OncoType DX classifiers, very few groups have successfully validated predictors generated from microarray studies (41, 42). This is concerning due to the overwhelming number of predictive classifiers in the literature. Although some literature is available which show an inability to cross-validate published predictive classifiers generated from microarray studies (43), it is possible that a publication bias has resulted in excluding negative results and other attempts at validating other classifiers may have remained unpublished.
The predictive capabilities of microarray studies have been questioned in the past. These studies are prone to high false discovery rates and the molecular predictors are often highly unstable with predictive classifiers being highly dependent on patient selection (44). Michiels and colleagues (44), using a multiple random training validation strategy on publicly available array data, showed that 5 of 7 studies from high-impact journals could not classify patients according to prognosis better than chance alone. In addition, it should also be noted that it is unclear how well the MammaPrint and OncoType DX platforms are performing in terms of their clinical utility (45).
What does this mean in terms of the future of predictive array studies? Alternative approaches, such as searching for relevant pathways involved in response or resistance, should be considered. Comparison of the gene lists identified in the 3 previously published reports (18–20) shows that while there are no shared genes, pathway analysis reveals that the TNF pathway is common to all 3. In fact, a similar analysis on 3 publications aiming to predict response to CRT in esophageal cancer (46–48) highlighted the NF-κB pathway as common to all predictive gene sets (49). The NF-κB transcription factor is a downstream target of TNF, suggesting that the TNF/NF-κB molecular pathway may play a significant role in determining sensitivity to CRT. The TNF/NF-κB pathway may prove to be a potential target for novel therapeutic interventions aimed at increasing sensitivity to CRT. This not only provides much needed insight into the mechanism of response but also serves as a potential predictive marker. Microarrays may still have a future in predictive medicine; however, much larger studies will be essential to generate more reproducible classifiers in addition to further refinement of current bioinformatic approaches.
These findings show that care must be taken when interpreting predictive studies. The predictive power of a classifier must be reproducible in independent data sets before a clinically useable platform can help tailor patient treatment. Although a number of previous studies were capable of predicting response to CRT in rectal cancer with relatively high accuracy (18–20), predictors are highly dependent on the sample set from which they are derived. This study could not validate these previously published predictors and highlights that alternatives to microarrays should be considered for predictive studies in rectal ADC. This research reveals that current microarray predictors are not robust enough for clinical utility in rectal cancer patients at this stage.
Disclosure of Potential Conflicts of Interest
The authors declare no potential conflicts of interest.
Grant Support
This work was funded by the NHMRC project grant 509004.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Acknowledgments
The authors thank Adam Kowalczyk for advice on bioinformatics and Rachel Greaney and Tina Thorpe for assisting with the collection of clinical data. The authors also thank Peter Gibbs, Niall Tebbutt, and Andrew Scott for their support and involvement in the project.
Footnotes
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
† Deceased.
- Received November 2, 2010.
- Revision received December 15, 2010.
- Accepted December 29, 2010.
- ©2011 American Association for Cancer Research.