
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Imaging, Diagnosis, Prognosis |
Authors' Affiliations: 1 GeneNews Corporation, Toronto, Canada; 2 Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Hong Kong, China; 3 Lam Wah Ee Hospital, Penang, Malaysia; 4 Department of Digestive Medicine, 2nd Affiliated Hospital of Medical College of Zhejiang University; 5 Department of Colorectal Surgery, 1st Affiliated Hospital of Medical College of Zhejiang University, Hangzhou, China; 6 Department of Surgery, 1st Affiliated Hospital, Anhui Medical University, Hefei, China; and 7 Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
Requests for reprints: Choong Chin Liew, GeneNews Corporation, 2 East Beaver Creek Road, Building 2, Richmond Hill, Ontario, Canada L4B 2N3. Phone: 617-834-4371; E-mail: cliew{at}genenews.com.
| Abstract |
|---|
|
|
|---|
Experimental Design: Total RNA was isolated from 211 blood samples (110 non-CRC, 101 CRC). Microarray and quantitative real-time PCR were used for biomarker screening and validation, respectively.
Results: From a set of 31 RNA samples (16 CRC, 15 controls), we selected 37 genes from analyzed microarray data that differed significantly between CRC samples and controls (P < 0.05). We tested these genes with a second set of 115 samples (58 CRC, 57 controls) using quantitative real-time PCR, validating 17 genes as differentially expressed. Five of these genes were selected for logistic regression analysis, of which two were the most up-regulated (CDA and MGC20553) and three were the most down-regulated (BANK1, BCNP1, and MS4A1) in CRC patients. Logit (P) of the five-gene panel had an area under the curve of 0.88 (95% confidence interval, 0.81-0.94). At a cutoff of logit (P) >+0.5 as disease (high risk), <–0.5 as control (low risk), and in between as an intermediate zone, the five-gene biomarker combination yielded a sensitivity of 94% (47 of 50) and a specificity of 77% (33 of 43). The intermediate zone contained 22 samples. We validated the predictive power of these five genes with a novel third set of 92 samples, correctly identifying 88% (30 of 34) of CRC samples and 64% (27 of 42) of non-CRC samples. The intermediate zone contained 16 samples.
Conclusion: Our results indicate that the five-gene biomarker panel can be used as a novel blood-based test for CRC.
Conventional CRC screening tests include fecal occult blood testing, flexible sigmoidoscopy, double-contrast barium enema X-ray, and colonoscopy (2). Each test has advantages and limitations (3–5). Fecal occult blood testing is noninvasive and relatively inexpensive and is recommended by the American Cancer Society for annual screening (5). Properly done, screening fecal occult blood testing allows CRC to be detected before the onset of symptoms in some patients, thereby reducing mortality by 15% to 33% (6–8). Recent data suggest that current community fecal occult blood screening does not reduce CRC mortality to the extent predicted by randomized, controlled trials (9). A contributing cause seems to be lack of patient compliance; only 20% of U.S. adults, 50 years and older, have annual fecal occult blood testing (2).
Colonoscopy is the "gold standard" for CRC diagnosis and screening and is recommended once every 10 years for average risk persons (2). Colonoscopy is expensive, invasive, frequently not readily available, and occasionally has serious complications. Additionally, many are unwilling to undergo screening colonoscopy: only 29% of average risk persons reported having a colonoscopy in the last decade (10). Importantly, <10% of average risk persons for CRC have advanced adenomas or CRC. Thus, colonoscopy is arguably "overkill" as a primary screening tool. Fecal DNA testing (11) and computed tomography examinations of the colon (virtual colonoscopy; ref. 12) are being investigated; however, their clinical utility is still unclear.
A sensitive, specific, blood-based, noninvasive test for early-stage CRC that does not depend on stool sampling has the potential for greater patient compliance, with associated public health benefits and decreased health care costs.
There is evidence that unique gene expression patterns in peripheral blood reflect static (inherited) and dynamic (environmental) changes that occur within the cells and tissues of the body (Sentinel Principle; ref. 13). Our laboratory has applied the Sentinel Principle across a broad range of diseases, including schizophrenia, cardiovascular disease, osteoarthritis, and bladder cancer (14–17). A preliminary report indicates that a blood biomarker panel was able to detect human CRC (18). Other laboratories have independently shown that blood gene expression can differentiate disease samples from controls (19–22).
In this study, we profiled peripheral blood samples from CRC patients and non-CRC controls. We validated a group of CRC transcript biomarkers using quantitative real-time PCR. We yielded a best logit (P) equation from five selected genes to classify CRC from non-CRC and tested the predictive performance of this five-gene biomarker panel against an independent set of test samples.
| Materials and Methods |
|---|
|
|
|---|
A 31-sample set (15 controls, 16 CRC) was used for gene profiling on Affymetrix U133Plus 2.0 chip (Affymetrix). A second set of 115 samples (data set AL8a1, 88 additional samples plus 27 samples for Affymetrix) composed of 57 controls and 58 CRC samples was a training set for quantitative real-time PCR validation. A third set of independent 92 samples (data set AL8a2, 49 controls and 43 CRC) was a blind test set.
Blood collection and RNA isolation. Samples of peripheral whole blood (10 mL) were collected in EDTA Vacutainer tubes (Becton Dickinson) and stored at 4°C until processing (within 6 h). Plasma was removed after centrifugation and a hypotonic buffer (1.6 mmol/L EDTA, 10 mmol/L KHCO3, 153 mmol/L NH4Cl, pH 7.4) was added at a 3:1 volume ratio to lyse the RBC. The mixture was centrifuged to yield a pellet containing WBC, and the pellet was resuspended into 1.0 mL of TRIzol reagent (Invitrogen Corp.) and 0.2 mL of chloroform. RNA quality was assessed on an Agilent 2100 Bioanalyzer RNA 6000 Nano Chip. RNA quantity was determined by absorbance at 260 nm in a Beckman-Coulter DU640 Spectrophotometer.
Microarray hybridization. Five micrograms of purified total RNA were labeled and hybridized against Affymetrix U133Plus 2.0 GeneChip oligonucleotide arrays (Affymetrix). Hybridization signals were adjusted in the Affymetrix GCOS software (version 1.1.1) using a scaling factor that adjusted the global trimmed mean signal intensity value to 500 for each array and imported into GeneSpring version 6.2 (Silicon Genetics). Signal intensities were centered to the 50th percentile of each chip, and for each individual probe set, to the median intensity of each specific subset first, to minimize possible technical bias, then to the whole sample set. Only genes identified by the GCOS software as "present" or "marginal" in all samples were analyzed.
Quantitative real-time PCR. Two micrograms of RNA was reversed transcribed into single-stranded cDNA using the High-Capacity cDNA Archive Kit (Applied Biosystems) in 100-µL reaction. For training set samples, 2-ng cDNA was mixed with SYBR Green master mix (SYBR Green PCR Kit, Qiagen) and primers in a 20-µL reaction volume. PCR amplification was done using the DNA Engine Opticon (Bio-Rad; formerly MJ Research). For blind test samples, 5-ng cDNA was mixed with the master mix and primers in 25-µL reaction volume. PCR amplification was done on ABI 7500 Real-time PCR System (Applied Biosystems). Dissociation curves generated at the end of each run were examined to verify specific PCR amplification and absence of primer-dimer formation. Forward and reverse primers for target genes were designed using PrimerQuest8 (Integrated DNA Technologies). The sequences of the six primer sets are listed (Table 1 ). Amplification efficiency for each primer pair was determined using a serial dilution of reference cDNA generated from a normal blood RNA pool to ensure that values were within linear range, and amplification efficiency was approximately equal for each gene tested. Amplification specificity was confirmed by agarose gel electrophoresis of the PCR products.
|
For quantitative real-time PCR results, we used the comparative Ct equation (User Bulletin #2, Applied Biosystems, 2001) to calculate relative fold changes (CRC versus controls). Welch t test was used to evaluate the differences in mRNA levels between controls and CRC patients. Differences were considered significant when P < 0.05.
We used logistic regression to analyze the dependence of the binary diagnostic variable Y (control, 0; disease, 1) on the
Ct values from the training data set. When P (probability of a patient sample) is diagnosed as "diseased," then a function X = logit (P) can be defined as follows:
![]() | (A) |
Maximum-likelihood fitting method was used to obtain the (empirical) coefficients {bi} that define the relationship between X and the experimental measurements {
Cti}. The {bi} values were obtained using MedCalc software program (MedCalc Software). Receiving operating characteristic (ROC) curve analysis was then used to evaluate the discriminatory power of the combinations (23). Classification power was determined by area under the curve (AUC), sensitivity, and specificity at the defined cutoff.
For cross-validation, data (
Ct values) were analyzed using the "simple logistic regression" function of WEKA9 under "Experimenter" mode. We applied 5-fold (where 4/5 of the samples as training set and the remainder 1/5 of the samples as test set) "cross validation" with iteration at a number of 1,000 repetitions. The output file is further analyzed using Excel to calculate the average accuracy for all iterations.
For prediction test, a blind test set was examined against the five genes. The
Ct values were used to calculate logit function Xi using the coefficients defined from the training set (Eq. A).
| Results |
|---|
|
|
|---|
|
|
Ct values of five differentially expressed genes. The best combination equation can be formulated as follows: logit (P) = –5.963 + 1.206 x BANK1
Ct + 0.879 x BCNP1
Ct – 0.881 x CDA
Ct – 0.375 x MGC20553
Ct – 0.405 x MS4A1
Ct. The AUC for the five-gene biomarker panel was 0.88 ± 0.03 (95% confidence interval, 0.81-0.94; P < 0.001; Fig. 2
). Cross-validation (5-fold) of this training set showed an average accuracy of 79% (SD, 7.5%), only 3 percentage points lower than for the training set itself.
|
0.3 cycle on average for the genes in the study.10 The value of 0.3 cycle was propagated through the logistic regression equation and gave an estimated "gray zone" extending from logit (P) = –0.5 to +0.5. Samples with logit (P) value above the upper threshold of +0.5 are classified as "high risk for cancer," whereas samples with logit (P) value below the lower threshold of –0.5 are classified as "low risk for cancer." Samples with logit (P) values between –0.5 to +0.5 were classified as "intermediate risk" (Fig. 3
).
|
|
| Discussion |
|---|
|
|
|---|
The uniqueness of our approach is our use of combinations of biomarkers assayed in whole blood (13, 17). By combining biomarkers, we can obtain higher levels of discrimination and reproducibility than possible with any single biomarker. The AUC from ROC analysis ranges from 0.62 to 0.86 (SD, 0.04-0.05) for each of the five biomarkers individually; the biomarker combination improves diagnostic capability to an AUC of 0.88 (SD, 0.030).
Once the AUC is established for a biomarker panel, the relative trade-off in performance between sensitivity and specificity is determined by the choice of cutoff point. For example, in our training set, if a single cutoff point is used, we can achieve a sensitivity of 95% with a specificity of 58% at a cutoff threshold of –0.5; a sensitivity of 81% with a specificity of 83% at a cutoff threshold of +0.5; and a sensitivity of 90% with a specificity of 79% at a cutoff threshold of 0.0 (Fig. 3).
In this report, we apply the concept of including a gray or intermediate zone to the interpretation of biomarker set results. Due to the technical limitations of quantitative real-time PCR as well as biological variability, an area of overlap occurs in the distribution between the high-risk and low-risk populations. Segregating this intermediate zone from the high-risk and low-risk zones improves the predictive performance of the test for the samples that fall into the high-risk or low-risk category (
80% in total; Table 3; Fig. 3).
This biomarker combination compares favorably in accuracy with fecal occult blood testing and with the fecal DNA test (11). Stool-based tests have relatively low sensitivity (5-25%) and relatively high specificity (80-95%; refs. 11, 25); our biomarker panel has similar specificity but much higher sensitivity. Furthermore, because we use a peripheral blood sample obtained by routine venipuncture, our approach has the advantage over fecal tests of being much more acceptable to patients.
Another advantage of our test is that it has a continuous-valued output: the logit (P). This makes it possible to define a movable threshold, which can be set to achieve a combination of sensitivity and specificity values (from the ROC curve) that best fits the intended use of the test. For example, if the test is intended to be applied in an average-risk population to identify patients who would likely benefit from colonoscopy examination, then the threshold is set for high sensitivity (true positive fraction). By contrast, tests such as fecal occult blood testing, fecal DNA test, and detection of circulating cancer cells in peripheral blood (26) have discrete (yes/no) outputs only.
In this study, we observed increased transcript levels of the gene cytidine deaminase (CDA; localized to 1p35-36.2) in blood from CRC patients. CDA is a salvage pathway enzyme that converts cytosine arbinoside (Ara-C) to Ara-U, thereby decreasing the formation of cytosine arabinoside triphosphate (Ara-CTP; 27, 28). Ara-C, a deoxycytidine analogue, is phosphorylated into its active form, Ara-CTP, which competes with dCTP for incorporation into DNA. Incorporated Ara-C blocks DNA synthesis and the cell undergoes programmed cell death. Studies of acute myeloid leukemia in children with and without Down syndrome indicate that elevated CDA transcript levels correlate with poor outcome in Ara-C–based chemotherapy (29, 30). CDA gene expression/activity and outcome of gemcitabine-based treatment also correlate in neuroblastoma cell lines (31) and pancreatic cancer (32). Ara-C and gemcitabine have been used in CRC treatment (33–36), but the correlation between CDA and treatment effectiveness has not been studied. That CDA was overexpressed in the present study suggests potential effects of CDA in Ara-C– or gemcitabine-based CRC treatment and warrants further investigation.
MGC20553, also up-regulated in CRC, was initially identified as a novel gene on chromosome 9q22.2-31.1. MGC20553 is a multifunctional protein essential for maintaining erythrocyte shape and membrane mechanical properties (37). The exact function of this gene in blood cells has yet to be determined. MGC20553 was studied in acute myeloid leukemia and no change in its expression was observed (38). Our study showed that MGC20553 helps discriminate between CRC and non-CRC, indicating that MGC20553 is a CRC response gene.
Three genes down-regulated in CRC blood samples, BCNP1, BANK1, and MS4A1, are expressed in B cells. BCNP1 protein, initially identified in chronic lymphocytic leukemia and in B-cell malignancies (39), had three predicted transmembrane domains and no known function. BANK1 is a novel substrate of tyrosine kinases. It is tyrosine phosphorylated on B-cell antigen receptor stimulation, which is mediated predominantly by tyrosine kinase Syk. Overexpression of BANK1 in B cells enhances B-cell antigen receptor–induced calcium mobilization and may be specific to antigen-induced immune responses (40). Gene MS4A1, a member of the membrane-spanning 4A gene family, encodes a B-cell surface molecule that functions in the differentiation of B-cells into plasma cells (41). Our findings indicate that these three genes might be functionally involved in CRC.
We have identified a peripheral blood biomarker panel able to discriminate CRC from non-CRC samples. This blood-based test will be valuable for screening populations for CRC. The currently recommended fecal occult blood test has relatively high specificity but low sensitivity. One recent study by Imperiale et al. (11) showed a sensitivity of 12.9% and a specificity of 94.4% for fecal occult blood testing in an average-risk screening population. Our blood CRC biomarkers showed a much higher sensitivity than fecal occult blood testing. More importantly, a blood-based test will have much better patient compliance than a fecal-based screening test. The increased rates of compliance expected for a blood test relative to other CRC screening modalities will potentially result in earlier cancer detection, with decreased morbidity and mortality and more effective utilization of health care resources. Further work is ongoing to identify additional specific markers informative for detecting CRC; to refine the algorithm for choosing optimal combinations of markers to be incorporated into a CRC biomarker panel; to examine the CRC biomarker panel using samples from patients with cancers other than CRC; and to examine CRC biomarker panel performance across a larger population.
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Current address for M. Han: Gene Diagnostic Inc., Hangzhou, China.
8 http://biotools.idtdna.com/primerquest ![]()
9 http://www.cs.waikato.ac.nz/ml/weka/ ![]()
Received 7/20/07; revised 9/11/07; accepted 10/19/07.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
V. Melotte, M. H. F. M. Lentjes, S. M. van den Bosch, D. M. E. I. Hellebrekers, J. P. J. de Hoon, K. A. D. Wouters, K. L. J. Daenen, I. E. J. M. Partouns-Hendriks, F. Stessels, J. Louwagie, et al. N-Myc Downstream-Regulated Gene 4 (NDRG4): A Candidate Tumor Suppressor Gene and Potential Biomarker for Colorectal Cancer J Natl Cancer Inst, July 1, 2009; 101(13): 916 - 927. [Abstract] [Full Text] [PDF] |
||||
![]() |
F Morello, D Schiavone, G Mengozzi, C Bertello, C C Liew, D Bisbocci, P Mulatero, and F Veglio Adrenal endothelin-1 levels are not associated with aldosterone secretion in primary aldosteronism Eur. J. Endocrinol., March 1, 2009; 160(3): 453 - 458. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |