Purpose: Esophageal squamous cell carcinoma (ESCC) is an aggressive tumor with poor prognosis. Understanding molecular changes in ESCC will enable identification of molecular subtypes and provide potential targets for early detection and therapy.
Experimental Design: We followed up a previous array study with additional discovery and confirmatory studies in new ESCC cases by using alternative methods. We profiled global gene expression for discovery and confirmation, and validated selected dysregulated genes with additional RNA and protein studies.
Results: A total of 159 genes showed differences with extreme statistical significance (P < E-15) and 2-fold differences or more in magnitude (tumor/normal RNA expression ratio, N = 53 cases), including 116 upregulated and 43 downregulated genes. Of 41 genes dysregulated in our prior array study, all but one showed the same fold change directional pattern in new array studies, including 29 with 2-fold changes or more. Alternative RNA expression methods validated array results: more than two thirds of 51 new cases examined by real-time PCR (RT-PCR) showed 2-fold differences or more for all seven genes assessed. Immunohistochemical protein expression results in 275 cases which were concordant with RNA for five of six genes.
Conclusion: We identified an expanded panel of genes dysregulated in ESCC and confirmed previously identified differentially expressed genes. Microarray-based gene expression results were confirmed by RT-PCR and protein expression studies. These dysregulated genes will facilitate molecular categorization of tumor subtypes and identification of their risk factors, and serve as potential targets for early detection, outcome prediction, and therapy. Clin Cancer Res; 17(9); 2955–66. ©2011 AACR.
More than 400,000 persons die from esophageal cancer in the world each year, and 80% are histologically esophageal squamous cell carcinomas (ESCC). Reducing mortality from ESCC will require primary prevention through the amelioration of etiologic risk factors, and secondary prevention via early detection coupled with effective therapy. Molecular alterations in the esophagus are targets for early detection and therapy strategies. The current study represents the most comprehensive profiling of global gene expression in ESCC to date, and identified an expanded list of 642 dysregulated genes, including 159 genes with marked dysregulation. Additional RNA and protein studies confirmed the profiling results. The dysregulated genes identified here will facilitate molecular categorization of tumor subtypes and identification of their risk factors as well as serve as potential targets for early detection, outcome prediction, and therapy.
Esophageal cancer is the sixth most common fatal human cancer in the world (1) and the fourth most common new cancer in China (2). Shanxi Province, a region in north central China, has among the highest esophageal cancer rates in China and nearly all of these cases are esophageal squamous cell carcinoma (ESCC). ESCC is an aggressive tumor which is typically diagnosed only after the onset of symptoms when prognosis is very poor. The 19% 5-year survival rate is fourth worst among all cancers in the USA (3). One promising strategy to reduce ESCC mortality is early detection, and a better understanding of the molecular mechanisms underlying esophageal carcinogenesis and its molecular pathology will facilitate the development of biomarkers for early detection.
The application of microarray analysis is a promising method for finding clinical biomarkers in various cancers and has been successful in identifying subsets of tumors (including ESCC) that correlate with clinical parameters such as survival, histological grade, invasive status, and response to therapy (4–12). Gene expression changes that distinguish patient outcomes are subtle or variable and it is unlikely that individual genes will successfully predict clinical behavior. Taken together, however, gene expression profiles can be used to generate accurate predictors and could give us a better understanding of the molecular alterations during carcinogenesis.
In earlier studies, we documented genomic changes in ESCC, including widespread allelic loss and frequent mutations in certain putative tumor suppressor genes (13–17). By using a cDNA microarray with 7,689 human cDNA clones, we previously tested expression in 19 ESCC patients and found 41 significant differentially expressed genes. Patients with and without a positive family history for upper gastrointestinal (UGI) tract cancers were also distinguishable by their gene expression patterns (18). To confirm these original results, we expanded our expression studies of ESCC to evaluate more cases by using alternative methods, including examination of more genes as well as validation/replication of selected dysregulated genes in additional patients by using different methods. This confirmatory study was primarily based on global gene expression profiling of 53 ESCC patients with the Affymetrix Human Genome U133 Set (U133A and 133B). To further validate/replicate our findings we compared these data with the following: (i) RNA expression in 51 additional ESCC cases for 7 differentially expressed genes by using quantitative real time PCR (RT-PCR); (ii) RNA expression from microdissected tumor and normal tissues in 17 ESCC cases for 41 dysregulated genes by using the Affymetrix Human Genome U133A v2.0; and (iii) protein expression of 6 genes by using immunohistochemistry (IHC) in 275 ESCCs on a tumor tissue microarray (TMA).
Four different groups of patients with ESCC were evaluated in this study, and all were enrolled in our UGI cancer genetic studies project, a single institution study by using a common research protocol. Patients enrolled in the project included consecutive cases of ESCC who presented to the Thoracic Surgery Department of the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, between 1998 and 2001, who had no prior therapy for their cancer, and who underwent surgical resection of their tumor at the time of their hospitalization. Selection of patients for RNA studies was based solely on the availability of appropriate tissues for RNA testing (i.e., consecutive testing of cases with available frozen tissue, tumor samples that were predominantly (>50%) tumor, and tissue RNA quality/quantity adequate for testing); patients without frozen tissues were included in the protein studies. After obtaining informed consent, patients were interviewed to obtain information on demographic and lifestyle cancer risk factors, and clinical data were collected. Selected demographic and clinical-pathologic features of the 4 different ESCC patient groups studied are shown in Table 1. In total, 396 different ESCC cases were evaluated. All cases were histologically confirmed as ESCC by pathologists at both the Shanxi Cancer Hospital and the National Cancer Institute (NCI). This study was approved by the Institutional Review Boards of the Shanxi Cancer Hospital and the NCI.
Paired esophageal cancer and normal tissue distant to the tumor were collected during surgery. Tissues for RNA analyses were snap frozen in liquid nitrogen and stored at −130°C until used, whereas tissues for IHC analyses were fixed in 70% alcohol and processed to paraffin.
Total RNA preparation
RNA was extracted by 2 methods. For the confirmatory analysis of ESCC cases with the Affymetrix U133A/B chip set and the validation/replication in cases by using RT-PCR, total RNA was extracted by the Trizol method following the protocol of the manufacturer. Only tumor samples with high purity (≥50% tumor cells) were selected for this extraction and subsequent analyses. A second method of RNA extraction was used for the microdissected tissue samples. For these samples, 5 to 10 consecutive 8-μm sections were cut from frozen tumor tissues and the normal counterpart tissues, and tumor and/or normal cells were manually microdissected under light microscopy. RNA from tumor and matched normal tissue was extracted by using the protocol from PureLink RNA mini kit (catalogue no. 12183-018A; Invitrogen). For both extraction methods, the quality and quantity of total RNA were determined on the RNA 6000 Labchip/Agilent 2100 Bioanalyzer (Agilent Technology, Inc.).
Probe preparation and hybridization
Each microarray experiment was carried out by using 8 μm of total RNA obtained from either the Trizol or PureLink extraction methods. Probes were prepared according to the protocol provided by the manufacturer (19). Procedures included first strand synthesis, second strand synthesis, double-strand cDNA cleanup, in vitro transcription, cRNA purification, and fragmentation. Twenty micrograms of biotinylated cRNA were finally applied to each hybridization array, either onto the Affymetrix GeneChip Human Genome U133 Set (HG_U133A and HG_U133B, Affymetrix) or the Affymetrix GeneChip Human Genome U133A 2.0. After hybridization at 45ºC overnight, arrays were developed with phycoerythrin-conjugated streptavidin by using a fluidics station (Genechip Fluidics Station 450) and scanned (Genechip Scanner 3000) to obtain quantitative gene expression levels. Paired tumor and normal tissue specimens from each patient were processed simultaneously during the RNA extractions and hybridizations.
Confirmation by RT-PCR analysis for 7 genes was carried out on an ABI 7000 Sequence Detection system by using paired tumor/normal ESCC samples from 51 cases as previously described (20). Briefly, 1 to 5 μg of total RNA were first converted to cDNA by using Superscript II (Invitrogen Corporation) in the presence of an oligo (dT)12–18 primer, and 100 ng of cDNA was applied for the subsequent PCR reaction (94°C × 10 minutes; 95°C × 15 seconds, 60°C × 1 minutes; 40 cycles). Results of the RT-PCR data are presented as CT values, where CT is defined as the threshold PCR cycle number at which an amplified product is first detected. The average CT was calculated for each gene evaluated and GAPDH, and the ΔCT was determined as the mean of the triplicate CT values for the evaluated gene minus the mean of the triplicate CT values for GAPDH. The ΔΔCT represents the difference between the paired tissue samples, as calculated by the formula ΔΔCT = (ΔCT of tumor – ΔCT of normal). The N-fold differential expression of the evaluated gene for a tumor sample compared with its normal epithelial counterpart was expressed as , which represents the fold change in the target gene expression in tumor normalized to an internal control gene (GAPDH) and relative to the normal control.
IHC analysis of ESCC TMA
The details of patient selection and TMA construction were previously described (20). Six genes that were significantly over or underexpressed on our previous 8K cDNA array were selected for IHC evaluation on the ESCC tumor TMA. These included CDC25B, LAMC2, FADD, KRT14, FSCN1 (all overexpressed), and KRT4 (underexpressed).
Slides were stained according to manufacturer's protocols for each of the 7 gene proteins (for details, see Supplementary Table S1). In brief, 5 μm thickness deparaffinized sections were pretreated with 3% H2O2 in methanol for 10 minutes. Antigen retrieval included pressure cooker treatment for 5 or 25 minutes and 10% normal goat serum for 1 hour to block endogenous peroxidase activity, followed by incubation with primary antibodies at an appreciated dilution of 1:40 or 1:50 for overnight at 4°C. The next day the slides were treated by using the secondary antibody (anti-mouse IgG (H+L), Vector Laboratories, 1:500 dilution) for 1 hour at room temperature, followed by the ABC (Vector Laboratories) solution for 1 hour at room temperature. Slides were developed with 0.02% 3′,3′-diaminobenzidine solution (DAB; Sigma), counterstained with hematoxylin, dehydrated in ethanol, and cleared in xylene. These procedures were carried out for all antibodies studied.
For assessment of gene proteins, 2 scores were assigned to each core: (i) the cytoplasmic staining intensity [categorized as 0 (absent), 1 (weak), 2 (moderate), or 3 (strong)]; and (ii) the percentage of positively stained epithelial cells [scored as 0 (0%), 1 (1%–25%), 2 (26%–50%), 3 (51%–75%), or 4 (>75%)]. An overall protein expression score was calculated by multiplying the intensity and positivity scores (overall score range, 0–12). This overall score for each patient was further simplified by dichotomizing it to negative (overall score of ≤3) or positive (score of ≥ 4). Stains were reviewed by 2 pathologists (M.K. and S.M.H.) and discussed to determine an appropriate analytic approach. Following the establishment of criteria, all cores on both arrays were read by a single pathologist (M.T.) by using the described criteria.
Formal statistical analyses were applied only to cases studied with the Affymetrix U133A/B set. Data from the other 3 groups studied here were limited to descriptive statistics. For all the Affymetrix U133A/B array data, raw data sets (CEL files on all samples) after scanning were normalized by using RMA, implemented in Bioconductor in R (21). The GEO accession numbers for these array data are GSE23400. Hierarchical clustering was carried out to characterize RNA array expression patterns and distinguish differences between tumor and normal samples. Paired t-tests were used to identify differences in matched tumor/normal sample expression. Paired t-tests were all carried out by using the R package.
Characteristics of the 396 total patients evaluated here are shown in Table 1. The 4 separate study groups included cases evaluated by the Affymetrix U133A/B chip set (n = 53), RT-PCR (n = 51), the Affymetrix U133 V2 chip (n = 17), and the tumor TMA (n = 275). The median age for cases in the 4 study groups ranged from 53 to 58 years, males predominated in all but 1 of the groups, tobacco use (medians 13% to 60%) and alcohol use (medians 12% to 53%) were common as was a family history of UGI cancer (medians 26% to 47%). The vast majority of the tumors were grade 2, over three-fourths were stage III, and metastatic disease was evident for nearly half the cases.
Affymetrix U133A/B experimental quality control
In the present study, we used the Affymetrix Human U133 set (Chip A and Chip B) which contain 39,000 transcripts and variants, including approximately 33,000 well-substantiated human genes in greater than 45,000 probesets. We assayed hybridization quality by using the Affymetrix GCOS software. The average MAS5 Present call of the 106 HG_U133A chips from the 53 ESCC patients was 50% (range 41%–59%), average scale factor was 3.0 (range 1.7–8.7), average background was 58.5 (range 36.5–81.2), average noise was 2.5 (range 1.3–3.56), and ratio of 3′/5′ signal of housekeeping gene GAPDH was 0.9 (range 0.7–1.3). Averages for the 102 HG_U133B chips from 51 ESCC patients (2 cases had no total RNA left) were: present call 34% (range 22%–42%), scale factor 7.7 (range 4.4–15.6), background 65.5 (range 36.8–145.3), noise 2.8 (range 1.5–5.9), and ratio of 3′/5′ signal of housekeeping gene GAPDH 1.0 (range 0.8–1.6). Other sample quality control parameters built into the chips by Affymetrix were also consistent with high-quality data. Expression signals for all probesets were used for the analysis.
Hierarchical clustering analysis of gene expression data
We used hierarchical clustering to characterize gene expression for all tumor/normal tissue pairs that had both U133 A and B array data (n = 51 pairs). First, we selected the 10% of probesets (n = 4,498) that had the highest variation across all 102 samples examined (variance > 0.31). An unsupervised 2-way hierarchical clustering analysis with the 4,498 probesets clearly separated tumors from normal samples (Supplementary Fig. S1). Only 2 normal samples and 3 tumors were misclassified based on this structure of 2 clusters. Tumors were further separated into several subclusters, although no clinical data, such as grade, stage, and metastasis, were associated with these subclusters.
Identification of genes differentially expressed between tumors and normal samples
To identify genes whose expression levels were altered in tumors, we carried out paired t tests for 53 cases with the Affymetrix U133A/B chip data. We found 642 genes (854 probesets) that showed significant differences in gene expression between tumor and normal tissues; these genes showed 2-fold or greater changes and were statistically significant after Bonferroni correction (i.e., P < 1.12E-6; Supplementary Table S2). To highlight a shorter list of target genes, we also applied a more extreme P value criterion (P < E-15) in conjunction with at least a 2-fold change, which identified 159 genes–116 upregulated genes (Table 2) and 43 downregulated genes (Table 3).
Affymetrix U133 v2.0 microdissected tissue validation
In our initial RNA expression study (i.e., 8K cDNA study; ref. 18), we identified 41 differentially expressed genes. As part of our validation efforts here, we also compared RNA expression for these 41 dysregulated genes by alternative methods, including different microarray platforms as well as different methods for RNA extraction and tissue procurement. These comparisons included results from 3 sets of analyses involving independent samples, consisting of the 8K cDNA study (N = 19), the Affymetrix U133A/B chip set (N = 53), and the Affymetrix U133 V2 chip (N = 17). Both the 8K cDNA and the Affymetrix U133A/B chip studies used total RNA extracted with the Trizol method but without microdissection. The Affymetrix U133 V2 chip study employed microdissected tissues from which RNA was extracted with the PureLink protocol.
A cross-platform comparison between the 8K cDNA and Human U133A/B set showed that, of the 41 dysregulated genes from our previous study, 40 were evaluable on both platforms (1 gene was not found in the Affymetrix probeset). Of these 40 genes, all but 1 (CD3EAP) showed the same gene expression pattern on both platforms (i.e., both up or both downregulated; Table 4). In addition to the directionality of the changes, the magnitude of the changes was also very similar: changes were 2-fold or greater for 10 of 13 (77%) upregulated genes on both platforms, whereas 19 of 28 (68%) downregulated genes showed fold changes of 0.50 or less on both platforms.
A comparison of different RNA extraction methods applied to the Affymetrix platforms showed that, of 38 genes examined on both Affymetrix platforms, only 1 had a different expression pattern (Table 4). EGR1 was downregulated on the Affymetrix U133A/B chip set (fold change 0.56), but upregulated on the Affymetrix U133 v2 chip (fold change 1.19).
It is also apparent from inspection of the data in Table 4 that the magnitude of the fold changes among upregulated genes seems to be highest in the microdissected tissue samples (i.e., Affymetrix U133A). For example, among the 13 upregulated genes, none tested on the 8K cDNA array showed a fold change of 3 or more, whereas 6 exceeded 3-fold changes on the Affymetrix U133A/B chip set, and 8 were higher than 3-fold on the Affymetrix U133 V2 chip, including 5 cases which reached over 5-fold changes. Although less consistent, the magnitude of the fold changes among downregulated genes also seemed to be more extreme in the microdissected tissue samples.
Quantitative RT-PCR validation
Seven genes were selected for validation (Table 5) in a new group of 51 ESCC cases as illustrative examples of the genes which showed the most prominent differences in either the current Affymetrix or the prior cDNA array evaluations. Among the 7 selected genes, 4 were upregulated (COL1A2, COL3A1, MET, and KRT14) and 3 were downregulated (SPINK7/ECG2, HPGD, and SASHI). Briefly, in at least two thirds or more of the 51 patients, all 4 upregulated genes showed increased mRNA expression (≥ 2-fold in tumor vs. normal) whereas all 3 downregulated genes showed decreased mRNA expression (≤0.5-fold in tumor vs. normal). Specifically, KRT14 was increased in 67% of cases, COL1A2 in 67%, COL3A1 in 84%, and MET in 72%. Likewise, ECG2 was decreased in 84% of cases, HPGD in 80%, and SASH1 in 67%.
Protein expression validation
Tumor tissue samples from 313 ESCC cases were arrayed on the tumor TMA. After exclusion of cores with inadequate tissue following sectioning and tissue transfer, a total of 275 ESCC cases had IHC-based protein expression data available for at least 1 of the 6 markers evaluated as part of our validation here (Table 1). Protein expression positivity (number of evaluable ESCC cases) was CDC25B 59% (N = 275), LAMC2 82% (275), FADD 15% (248), KRT14 33% (249), FSCN1 56% (231), and KRT4 84% (171).
In the present study, we compared genome-wide gene expression in the tumors from 53 ESCC cases to their matched normal tissue samples. We found that tumors and normal tissues had intrinsically different expression patterns and were easily separated into 2 clusters based on unsupervised 2-way hierarchical clustering analysis. We identified 642 genes whose gene expressions differed between tumors and normal tissues by using typical criteria (at least 2-fold change and P < 1.12E-6).
Several recent studies analyzed gene expression profiling for ESCC (22–25). One study also used Affymetrix chips but studied only 15 ESCC cases (24), whereas other studies applied cDNA microarrays with Cy3 or Cy5 labeling that examined more limited numbers of genes (22,23). The largest number of ESCC cases studied was a Japanese report of 54 cases examined with the Affymetrix Human U133 A chip; however, pair-matched normal tissues were not used in that study (26). Thus, among genome-wide expression studies employing the optimal design—pair-wise matched tumor/normal tissue comparisons—the present study is the largest (53 cases) and most comprehensive (33,000 genes) ESCC evaluation to date.
With extreme statistical criteria (P < E-15 and at least a 2-fold change) the number of dysregulated genes was reduced to 159 (Tables 2 and 3). The functions of these 159 genes most prominently relate to biochemical enzymes (26 genes), protein transportation or binding (23 genes), DNA replication (20 genes), cell cycle regulation (19 genes), cell membrane proteins (16 genes), extracellular matrix (13 genes), and cell growth (11 genes). Some of these genes (e.g., MMP, collagen families, keratins, CDC25B, calcium-binding S100 proteins, and Annexin families) have previously been shown to function in squamous cell differentiation, invasion, or proliferation (27–29).
Examination of mRNA expression by RT-PCR for 7 array-dysregulated genes in an independent series of ESCC cases showed results that were highly comparable with both our current Affymetrix U133A/B chip data and the findings from our earlier cDNA microarray study (18). Taken together, these results indicate that gene expression profiles in ESCC are consistent across different platforms and that dysregulated gene expression is a reproducible biomarker discovery tool.
We previously found 41 differentially expressed genes in ESCC cases by using an 8K cDNA microarray (18). All 41 of these genes were evaluated in the current Affymetrix-based study and all showed the same tumor/normal expression ratio directionality, save for 1 gene (FOSL2). Both the cDNA and the Affymetrix arrays used total RNA extracted by the Trizol method. To minimize the impact of normal contamination of our tumor samples, we further evaluated these 41 genes by using microdissected RNA procured from another set of 17 ESCC cases and tested with Affymetrix U133A v2.0 chips. Results showed that the tumor/normal expression ratio directionality was the same for most of the genes (88%), however, the magnitude of the fold changes was markedly higher in the microdissected as opposed to the nonmicrodissected samples (8K cDNA and U133A/B set chip studies). We presume that this reflected reduced heterogeneity of the tissue samples when microdissection was employed, and a consequent increased signal-to-noise ratio. For example, COL1A2 just reached the 2-fold change threshold in the 8K cDNA array study, was 6.6-fold increased in the Affymetrix U133A/B set array, but was nearly 12-fold increased when microdissected RNA was used with the Affymetrix U133A v2.0 array (Table 4). Although COL1A2 was the most extreme and clear-cut example of this increased signal-to-noise ratio, among 38 (of the 41) genes evaluated here, 28 showed their most extreme fold changes (either increased or decreased) in microdissected samples. Our comparison studies show that microdissection is a powerful approach. Results here are in agreement with other observations showing that microdissection provides relatively pure cell populations that are particularly useful for interrogating specific targets of interest (30).
Results from all 3 experiments reported here show broadly uniform findings for the expression patterns of the 41 genes emphasized, with the largest fold changes predominantly from the array that used microdissected RNA. To our knowledge, this is the first report confirming differential gene expression carried out by using 2 different microarray platforms and 2 different RNA extraction methods.
We chose 6 genes to evaluate at the protein level by applying IHC techniques to our ESCC tumor TMA. Three of the upregulated genes (CDC25B 59%, LAMC2 82%, and FSCN1 56%) showed positive protein expression in the majority of ESCC cases studied, results that were highly concordant with RNA expression results. The other 2 upregulated genes showed positive protein expression but in less than half of the ESCC cases studied (KRT14 33% and FADD 15%). The downregulated gene, KRT4, was positive for protein expression in 84% of ESCCs, which did not correlate well with RNA results.
CDC25B has been shown to be a potential early biomarker as its protein expression increased with morphologic progression across the continuum of normal to dysplasia to invasive ESCC in our previous study (20). Patterns of LAMC2 protein expression showed a strong relation to survival, suggesting a potential role in prognosis (20). The expression of FSCN1 protein in epithelial neoplasms has been described (31–34), but its expression in ESCC is still unknown. The present study showed that FSCN1 protein expression was observed in most ESCC tissue cores (68%), which is in accord with RNA expression findings. Although KRT14 protein expression was high in ESCCs in the current study, dysplastic and normal esophagus tissues were not evaluated. Xue and colleagues (35) did evaluate normal, dysplastic, and invasive ESCCs within esophagectomies from the same cases and observed that protein expression positivity increased across this morphologic progression from 13% to 41% to 62%, respectively, suggesting some discrimination between clinically and diagnostically important categories.
KRT4 protein expression has previously been reported in several tumors of the upper digestive tract, including esophageal adenocarcinoma (36). We found KRT4 mRNA downregulated in both our 8K and Affymetrix microarray studies, yet protein expression was positive in 84% of ESCCs in our tumor TMA. Chung and colleagues (37) reported that KRT4 protein expression decreased in the transition from normal to dysplasia to invasive tumor in a study of 6 carefully characterized ESCC cases. Of additional interest, ESCC cases with higher KRT4 mRNA in the present study had longer survival.
In summary, we identified an expanded list of 642 dysregulated genes in ESCC. These genes provide potential new targets for early detection and treatment.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
This research was supported by National Cancer Institute contract (N02-SC-66211) with the Shanxi Cancer Hospital and Institute and by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics, and Center for Cancer Research.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
- Received October 10, 2010.
- Revision received February 19, 2011.
- Accepted February 22, 2011.
- ©2011 American Association for Cancer Research.