Clinical Cancer Research Bridging the Lab and the Clinic in Cancer Medicine Translational Cancer Medicine 2008: Cancer Clinical Trials and Personalized Medicine
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Cell Growth & Differentiation

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Yamagata, N.
Right arrow Articles by Carbone, D. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yamagata, N.
Right arrow Articles by Carbone, D. P.
Clinical Cancer Research Vol. 9, 4695-4704, October 15, 2003
© 2003 American Association for Cancer Research


Molecular Oncology, Markers, Clinical Correlates

A Training-Testing Approach to the Molecular Classification of Resected Non-Small Cell Lung Cancer

Noboru Yamagata, Yu Shyr, Kiyoshi Yanagisawa, Mary Edgerton, Thao P. Dang, Adriana Gonzalez, Sorena Nadaf, Paul Larsen, John R. Roberts, Jonathan C. Nesbitt, Roy Jensen, Shawn Levy, Jason H. Moore, John D. Minna and David P. Carbone1

Vanderbilt-Ingram Cancer Center and Department of Medicine [N. Y., K. Y., T. P. D., S. N., D. P. C.], Department of Preventive Medicine [P. L., Y. S.], Department of Pathology [M. E., A. G., R. J.], Department of Cardiac and Thoracic Surgery [J. R. R.], and Department of Molecular Physiology and Biophysics [S. L., J. H. M.], Vanderbilt University School of Medicine, Nashville, Tennessee 37232-6838; Cardiovascular Surgical Associates, Saint Thomas Hospital, Nashville, Tennessee 37205 [J. C. N.]; and Hamon Center for Therapeutic Oncology Research, University of Texas Southwestern Medical Center, Dallas, Texas 75235 [J. D. M.]

ABSTRACT

Purpose:RNA expression patterns associated with non-small cell lung cancer subclassification have been reported, but there are substantial differences in the key genes and clinical features of these subsets casting doubt on their biological significance.

Experimental Design: In this study, we used a training-testing approach to test the reliability of cDNA microarray-based classifications of resected human non-small cell lung cancers (NSCLCs) analyzed by cDNA microarray.

Results: Groups of genes were identified that were able to differentiate primary tumors from normal lung and lung metastases, as well as identify known histological subgroups of NSCLCs. Groups of genes were identified to discriminate sample clusters. A blinded confirmatory set of tumors was correctly classified by using these patterns. Some histologically diagnosed large cell tumors were clearly classified by expression profile analysis as being either adenocarcinoma or squamous cell carcinoma, indicating that this group of tumors may not be genetically homogeneous. High {alpha}-actinin-4 expression was identified as highly correlated with poor prognosis.

Conclusions: These results demonstrate that gene expression profiling can identify molecular classes of resected NSCLCs that correctly classifies a blinded test cohort, and correlates with and supplements standard histological evaluation.

INTRODUCTION

Lung cancer represents a challenging clinical problem in most of the developed countries. The number of deaths from lung cancer in the United States is more than the next four most common cancers combined. Despite the best current treatment, the overall 5-year survival after diagnosis is only 10–15%. Improvements in prevention, early detection, prognosis, and therapy have been difficult to achieve. Clinically, lung cancers display a broad range of clinical behaviors ranging from slowly progressing to rapidly fatal, they can be highly metastatic or only locally invasive, and they may display responsiveness or resistance to therapy (1) ; the molecular basis of these variations in behavior is completely unknown.

The classification of lung cancers has traditionally been based primarily on light microscopic morphological findings. According to the current histological lung cancer classification proposed by the WHO in 1981, lung cancers can be divided into two broad groups, small cell lung cancer, accounting for 20–25% of bronchogenic carcinomas, and NSCLC,2 accounting for almost all of the remaining cases. NSCLC has three major subgroups: adenocarcinoma, squamous cell carcinoma, and large cell carcinoma (2) . Even within the subgroup of NSCLC there is a great degree of heterogeneity in behavior, and the histological subclassifications for NSCLCs have no predictive use and all are treated identically despite decades of research.

It is clear that each tumor has unique genetic differences, and it is hypothesized that these differences determine its biological behavior. A large effort has been made by many laboratories to study many individual candidate genetic abnormalities in an attempt to develop molecular markers for lung cancer classification and prognosis, but after hundreds of such studies, none of these single markers are of any real clinical utility. Even today, all NSCLCs are usually treated identically, stage for stage, and no molecular marker is used for routine therapeutic decisions. Thus, it is becoming clear that complex biological behaviors of tumors will only be explainable by complex patterns of multiple markers.

Microarray technology has enabled expression analysis of thousands of genes at one time, allowing insight into complex gene expression patterns and perturbations (3) . To date, microarray technology has been successfully applied to a wide variety of malignant diseases, such as leukemia, lymphoma, colon cancer, melanoma, ovarian cancer, breast cancer, hepatocellular carcinoma, and prostate cancer (4, 5, 6, 7, 8, 9, 10) . These studies have succeeded in identifying dozens of crucial genes that are up- or down-regulated in certain types of malignant cells or tissues. In lung cancer, several groups have reported microarray-based subclassifications of lung adenocarcinomas, but these studies differ from each other in significant ways, and none of these studies have tested their patterns by using blinded sets of tumors (11, 12, 13) .

To begin to explore lung cancer molecular profiles that predict biological behavior, e.g., histological subtype or survival of lung cancer patients, we applied cDNA microarray technology to the study of a set of freshly resected human lung cancers and used multiple statistical methods to correlate differentially expressed genes with histology and clinical outcome. We then successfully tested this classification pattern with an independent cohort of tumors.

MATERIALS AND METHODS

Tissues and Cell Lines.
Lung cancer, normal lung, and nonlung tumor tissues in excess of what was necessary for diagnostic purposes were obtained <15 min after removal from the patient and placed in RNAlater (Ambion, Austin, TX) before being snap frozen in liquid nitrogen. Before and independent of the molecular analysis, the lung cancer patients were assigned a Tumor-Node-Metastasis postsurgical stage score according to the current international lung cancer staging system, and then classified histologically into adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and small cell carcinoma using standard WHO criteria (2) . The cancers were additionally classified as well, moderately, or poorly differentiated. A pool of RNAs isolated from six different lung cancer cell lines was prepared as a common reference to represent the common histological subtypes of lung tumor. This RNA pool provided us with a large amount of renewable and consistent RNA likely to contain the vast majority of genes expressed in lung cancer and enabled us to compare gene expression patterns across tumor samples. Six lung cancer cell lines of different histological subtypes, H23, H69, H157, H441, H596, and H727, were selected and purchased from American Type Culture Collection. These cell lines were all tested to be negative for Mycoplasma contamination. The cell lines were cultured in RPMI 1640 supplemented with 10% fetal bovine serum (Hyclone, Logan, UT) under exponential growth conditions at 37°C in 5% CO2 until 70–80% confluent in 150 cm2 flasks. Refer to the supplementary information on our website for more information.3

RNA Preparation.
Cell lines were washed once with PBS and immediately processed for RNA extraction. Cell lines and pulverized snap-frozen tumor tissues were lysed in TRIzol (Life Technologies, Inc., Rockville, MD), and total RNAs were extracted. The RNA was purified by RNeasy (Qiagen, Valencia, CA) according to the manufacturer’s instructions and stored at -80°C. The quantity and the quality of RNA were assessed by UV spectrometer and electrophoresis on a 1% formaldehyde agarose gel.

Microarray Preparation.
5184 cDNA inserts were PCR amplified from the sequence verified human clones purchased from Research Genetics (Rockville, MD) to represent 4827 unique genes. Aliquots of all of the PCR products were examined by agarose gel electrophoresis. The 681 products that did not amplify or contained multiple bands were not used for data analysis. For arraying, the PCR amplified cDNA inserts were resuspended in 3x SSC and arrayed onto the poly-L lysine coated glass slides by Stanford type microarrayer robot at the Vanderbilt Microarray Shared Resource. Printed slides, VMSR human 5k cDNA microarrays, were postprocessed by UV exposure. The complete gene list and chip production information is available on our web site.4 Just before hybridization, slides were prehybridized in 5x SSC, 0.1% SDS, and 1% BSA (Sigma, St. Louis, MO) for 45 min at 42°C. After washing five times by dipping in MilliQ purified water at room temperature, slides were dipped into isopropanol and air-dried.

Microarray Analysis.
Fluorescently labeled cDNAs were made from 50 to 100 µg of sample and reference total RNA through an anchored oligodeoxythymidylic acid [5'-T(20)VN-3' (V = any nucleotide except T, n = any nucleotide)] primed reverse transcriptase reaction. The labeling reactions were done in the presence of 100 ng/µl anchored primer, 200 µM each of dATP, dGTP, and dTTP; 120 µM dCTP; 10 mM DTT; 200 units of SuperScript II (Life Technologies, Inc.); and 120 µM Cy3-dCTP or Cy5-dCTP (Amersham Pharmacia, Piscataway, NJ) in a 30-µl solution. After hydrolyzing the RNA template with NaOH, the labeled single-stranded cDNA was purified by QIAquick PCR purification kit (Qiagen) according to the manufacturer’s instructions. The purified cDNA was dried in a SpeedVac and resuspended in hybridization solution [3x SSC/0.2% SDS/1 µg/µl yeast tRNA/1 µg/µl poly(dA)], heat denatured, applied to the slide, and sealed under a coverslip. The slide was placed in a humidified chamber at 65°C for 14–16 h. After hybridization, the slide was washed in 2x SSC/0.1% at 55°C for 5 min, 1x SSC at room temperature for 5 min, and 0.1x SSC at room temperature for 5 min. The slide was dried and scanned by a GenePix 4000B scanner (Axon Instruments, Inc., Foster City, CA). The resulting image was analyzed using GenePix Pro software (Axon Instruments, Inc.). To assay for artifacts caused by specific dye combinations, we performed two reciprocal hybridizations on different arrays for each sample (switching the dyes between test sample and reference cDNAs). When extracting the data from the original pictures captured by the scanner, spots with obvious blemishes or spots of which the diameters were <60 µm were flagged out at the initial step and excluded for additional analysis. Also, 96 spots that had not been assigned to any unigene cluster were excluded from this analysis. Normalization was performed based on the premise that the arithmetic mean of the ratios from every spot that performed well should be equal to 1 (14) . In brief, nonflagged spots that had a ratio between 0.1 and 10 were selected. The log value for the expression ratio of each selected spot was determined. The average of all log values (AvgLog) was calculated. The normalized ratio for each spot was calculated as the normalized ratio = original ratio:10AvgLog. The data from spots of which the fluorescence intensity in each channel was <1.4 times the local background were deleted for subsequent analysis. We also excluded data that was only available from one of the reciprocal experiments or where data from reciprocal experiments differed by >2-fold. After this data filtering, we took the average of the data from the reciprocal experiments in the logarithmic field. Finally, data from spots that had >70% of data present across the samples in the training cohort were used for additional statistical analysis. Thus, in the final data table, we had data from 3811 spots that represented 3647 genes. We used every gene on this table for the search for differentially expressed genes among the training sample subtypes. The original data tables and more information regarding with our analysis are available as the supplementary information on our website.3

Statistical Data Analysis.
The statistical analyses for the microarray data were focused on the following steps: (a) selecting the important genes that were differentially expressed among the histological groups; (b) using the class prediction model based on the WFCCM (15 , 16) to verify if the genes selected in step one have the statistical significant prediction power on the training samples; (c) applying the prediction model generated from step two to a set of blinded samples for examining the prediction power on the blinded samples; and (d) using the agglomerative hierarchical clustering algorithm (17) to investigate the pattern among the statistically significant discriminator genes as well as the biological status.

The selection of important genes was based on SAM (18) , Weighted Gene Analysis (10) , and the t test, and the cutoff points for each method were 3.7, 3.0, and P < 0.0001, respectively. The cutoff points were determined based on the significance as well as the prediction power of each method. The gene was on the final list if it met at least one of these three selection criteria.

The WFCCM (15 , 16) was used in the class-prediction model based on the selected genes. This method was designed to combine the most significant genes associated with the biological status from each analysis method, e.g., SAM, Weighted Gene Analysis, Info Score (20) , and t test. In other words, the WFCCM is an extension of the compound covariate method, which allows for the consideration of more than one statistical analysis method into the compound covariate, and it reduces the dimensionality of the problem using a new covariate obtained as a weighted sum of the most important predictors. The WFCCM for tumor sample i is defined as WFCCM(i) = {sum}j[{sum}k(STjk)] [Wj][xij], where xij is the log-ratio measured in tissue sample i for gene j. STjk is the standardized statistic, e.g., t-statistic, for statistical analysis method k. Wj is the weight of gene j, which is defined as Wj = [({sum}k Ijk/K) (1 - Info Scorej)], where Ijk = 1, if the gene j was statistically significant in method k; and Ijk = 0, if the gene j is not statistically significant in method k.

The class-prediction model was applied to determine whether the patterns of gene expression could be used to classify tissue samples into two classes according to the chosen parameter, e.g., normal tissue versus tumor tissue. We estimated the misclassification rate using leave-one-out cross-validated class prediction method based on the WFCCM. This leave-one-out cross-validated method was processed in four steps. First, WFCCM was applied to calculate the single compound covariate for each tissue sample based on the significant genes. Second, one tissue sample was selected and removed from the data set, and the distance between two tissue classes for the remaining tissue samples was calculated. Third, the removed tissue sample was classified based on the closeness of the distance of two tissue classes. Fourth, steps 2 and 3 were repeated for each tissue sample. To determine whether the accuracy for predicting membership of tissue samples into the given classes (as measured by the number of correct classifications) was better than the accuracy that could be attained for predicting membership into random grouping of the tissue samples, we created 5000 random data sets by permuting class labels among the tissue samples. The cross-validated class prediction was performed on the resulting data sets, and the percentage of permutations that resulted in as few or fewer misclassifications as for the original labeling of samples was reported. If <0.05 of the permutations resulted in as few or fewer misclassifications, the accuracy of prediction into the given classes was considered significant.

The prediction of the blinded samples was completed using the method described above. The blinded sample was classified based on the closeness of the distance of two tissue classes, which was determined using the WFCCM.

The agglomerative hierarchical clustering algorithm (17) was applied to investigate the pattern among the statistically significant discriminator genes as well as the biological status using the software of Eisen et al. (21) .

Survival was estimated with the Kaplan-Meier method, and differences between groups were compared with the log-rank test.

Sequence Verification of cDNAs.
For the differentially expressed genes, the cDNA inserts were verified by DNA sequencing using vector-specific modified M13 primers, forward primer (5'-GTTTTCCCAGTCACGACGTTG-3') or reverse primer (5'-TGAGCGGATAACAATTTCACACAG-3'). Cycle sequencing reactions were performed with fluorescent-labeled nucleotides at the DNA sequencing shared resource in Vanderbilt-Ingram Cancer Center. Sequence database searches were performed with Basic Local Alignment Search Tool (BLAST) sequence comparison programs at National Center for Biotechnology Information.5

RESULTS

Initial Data Analysis and Sample Re-Evaluation.
We first sought to establish whether molecular profiling of our tumor set could identify genes of which the expressions correlated with known light microscopic histological subgroups of lung cancer. Differentiation of NSCLC from small cell lung cancer or lung primary from metastasis to the lung from other organs is of significant clinical interest and sometimes problematic in practice. The ability of this technique to identify patterns of genes associated with these known histological subgroups would also serve as a useful proof of principle. For this purpose, we analyzed 26 resected primary lung cancers, 3 normal lung tissues, and 2 metastatic lung tumors as our training cohort (Table 1Citation , Supplementary Table 1Citation ).3 We compared the gene expression profiles of the following sample group pairs: normal lung tissues to tumor tissues, normal lung to primary lung tumors, and normal lung to metastatic lung tumors and NSCLCs. Furthermore, within the NSCLC group, we compared adenocarcinomas to nonadenocarcinomas, squamous cell carcinomas to nonsquamous cell carcinomas, and large cell carcinomas to non-large cell carcinomas. According to our statistical criteria (P <= 0.0001 or absolute value of SAM >=3.75), we were able to identify groups of genes of which the expression level best segregated samples into these groups except for the large cell carcinoma category (Supplementary Fig. 1, A–DCitation , right gene lists).3


View this table:
[in this window]
[in a new window]
 
Table 1 Classification of 31 samples in training cohort according to gene expression pattern

 


View larger version (46K):
[in this window]
[in a new window]
 
Fig. 1. Dendrogram from initial hierarchical clustering analysis. Each dendrogram shows the samples examined and similarity in gene expression patterns across the samples. A indicates that the sample had pretreatment. B indicates that the sample histology was revised after initial analysis, see the text.

 
To examine how these gene expression patterns differentiated the sample groups, we used hierarchical clustering analysis and class-prediction models. Hierarchical clustering analysis classified the samples into subgroups, almost exactly as we expected (Fig. 1, A–DCitation ; Supplementary Fig. 1, A–DCitation ).3 However, there were several misclustered samples. For example, sample 32, thought to be a lung adenocarcinoma, clustered with normal lung as is shown in Fig. 1, A and BCitation . Sample 5, of which the pathology report stated that it was a squamous cell carcinoma, clustered with adenocarcinoma as is shown in Fig. 1, C and DCitation . Sample 22, stated to be a squamous cell carcinoma, clustered with adenocarcinoma as is shown in Fig. 1DCitation . Next, we built a class prediction model system according to gene expression profile. Using our class prediction model system, most of the samples in our training cohort could be classified appropriately according to their gene expression pattern (Table 1)Citation . However, again, several samples (samples 32, 4, 5, 19, and 23) could not be classified into their stated histological groups.

Slides of the samples (samples 32, 5, 22, 4, 19, and 23) that had discrepancies between histological classification and gene expressional classification were re-examined by a pathologist. Several light microscopic misclassifications were identified. According to histological re-evaluation, in sample 32, there was only ~15% of tumor tissue, and the other 85% consisted of fibrous tissue, explaining its clustering with the normal samples. Sample 5 showed nests with central necrosis suggesting cells with squamous differentiation. However, most of the tumor in this sample was very poorly differentiated lung cancer without intracellular bridges and without cytoplasmic keratinization, such that this sample might be better classified as large cell lung cancer. We also found that sample 22 was better classified as large cell carcinoma on review.

Through this initial analysis, we recognized the need for careful reverification of standard clinical histology and information on our samples, and all of the samples were rereviewed without finding any more major histological changes. In our review of clinical information, we found that sample 34 and sample 43 had received chemotherapy before surgery. For additional analysis, these two squamous cell carcinoma samples were analyzed in test cohort separately from other nonpretreated samples (Supplementary Table 1Citation ).3

Identification of Genes Expressed in Sample Groups.
We reanalyzed the entire data set using the revised sample information (Supplementary Table 1).3 Again, we sought genes of which the expression levels best segregated samples into histological groups using the same statistical methods. Through this analysis, in most of the histological group comparisons, we identified a larger number of significant genes (Fig. 2, A–ECitation Citation ; Supplementary Table 6).3 When we compared large cell lung carcinoma to non-large cell carcinoma in NSCLCs, we identified 2 statistically significant genes. In addition to the comparisons used in the initial analysis, we compared 3 more sample combinations within NSCLCs: adenocarcinomas to squamous cell carcinomas, adenocarcinomas to large cell carcinomas, and squamous cell carcinomas to large cell carcinomas. Although we could identify 27 genes of which expression levels differentiate adenocarcinoma from squamous cell carcinoma (Fig. 2F)Citation Citation , we could not identify any gene of which the expression level differentiated large cell carcinomas from adenocarcinoma or large cell carcinomas from squamous cell carcinoma. The expression differences of several of these genes were also confirmed by Northern blotting (Supplementary Fig. 2, A and B).3



View larger version (65K):
[in this window]
[in a new window]
 
Fig. 2. Hierarchical clustering analysis according to the gene expression profiles. The genes highly correlated with each class were used for this analysis. The GenBank accession numbers and gene names are shown at the right. Each row represents an individual gene and each column an individual sample. The dendrogram at the top shows the similarity in gene expression profiles of the samples. Each square on the matrix represents the expression level of a single gene in each sample, with red and green indicating transcript levels above and below that gene expression in reference RNA. Gray squares indicate missing or excluded data. Refer to the supplementary information on our website4 for original data and more information.

 


View larger version (53K):
[in this window]
[in a new window]
 
Fig. 2A. Continued

 
The proteins encoded by these differentially expressed genes in various lung cancer groups included transcription factors, proteins related to organic transport, and various metabolic cycles including the activated methyl cycle-related enzymes or structural components. Among the genes within these groupings were some of which the expression was already known to correlate with certain lung cancer subtypes. For example, the folate receptor is known to be expressed predominantly in adenocarcinoma of lung, whereas Mucin 1 is also known to be highly expressed in some lung adenocarcinomas and Keratins are known squamous differentiation markers (22, 23, 24) .

Hierarchical cluster analysis showed clearer clustering for each histological group compared with the initial analysis (Fig. 2, A–D)Citation Citation . However, large cell carcinomas were again difficult to cluster into one group. When we clustered NSCLC samples according to the expression levels of genes of which the expression patterns correlated with adenocarcinoma, large cell carcinoma samples 4 and 5 clustered closely with adenocarcinomas (Fig. 2C)Citation Citation . When we clustered NSCLC samples according to the expression levels of genes of which the expression patterns correlate with squamous cell carcinoma, large cell carcinoma samples 6, 14, and 19 clustered with squamous cell carcinomas (Fig. 2D)Citation Citation . Even when we used the expression patterns of 2 genes identified to correlate significantly with large cell carcinoma clustering, the 2 large cell carcinoma samples 5 and 19 clustered with non-large cell carcinomas (Fig. 2E)Citation Citation . Overall, our class prediction model system could classify almost all of the samples in the training cohort correctly except sample 5 (a large cell carcinoma), sample 23 (a squamous cell carcinoma), and sample 19 (a large cell carcinoma; Table 2Citation ).


View this table:
[in this window]
[in a new window]
 
Table 2 Classification of 29 samples in revised training cohort according to gene expression pattern

 
Class Prediction of Blinded Samples in Test Cohort.
To confirm the relationship between the expression patterns of identified groups of genes and histological groups, we attempted to predict the histology of unknown samples in a test cohort using our class prediction model. For this purpose, we analyzed the additional data from 15 new samples in a blinded test cohort. Through this analysis, we treated the 2 samples from patients who had received chemotherapy before surgery separately from the other samples (Supplementary Table 1).3 Class prediction of these blinded samples was performed based on the expression patterns of genes identified as correlated with known histological groups through the training cohort analysis (Table 3)Citation . We could successfully distinguish all of the tumor samples from normal lung correctly. We could also predict all of the primary lung tumors and metastatic lung tumors except sample 48. Sample 48, a metastatic lung tumor from a colon adenocarcinoma, was predicted to be a primary lung tumor and additionally predicted as a lung adenocarcinoma. Interestingly, this patient had multiple primary tumors, and this tumor was initially thought to be a primary lung tumor clinically, but ultimately judged to be a colon metastasis on clinical grounds. In the NSCLC group, most of the samples could be predicted correctly. However, it was again difficult to classify large cell carcinomas.


View this table:
[in this window]
[in a new window]
 
Table 3 The class prediction of 13 samples in test cohort according to gene expression pattern

 
We also applied our classification system to the 2 pretreated squamous cell carcinoma samples, 34 and 43. Our prediction model could not identify these samples as squamous cell lung carcinoma correctly. Sample 34 was predicted as a primary NSCLC but was not assigned into any of the subtypes. Sample 43 was predicted not to be a primary lung cancer but more like normal lung or metastatic lung tumor. Interestingly, tumor 34 showed little response to therapy, whereas sample 43 had a good partial response clinically.

Identification of Genes That Relate to Clinical Behavior.
Finally, we identified genes that correlated with the biological behavior of lung cancers. For this analysis, we used two kinds of clinical information, postoperative nodal metastatic status and overall survival (Supplementary Table 7).3 In the first analysis, we looked for genes that were differentially expressed between the nodal metastasis negative (N0, n = 17) and positive groups (N1 or N2, n = 14) in NSCLC samples (n = 31, data from sample 47 was removed through this analysis because of the duplication with sample 6; Supplementary Table 8).3 No gene was identified as differentially expressed between these groups in this dataset. Next, we tested for genes of which the expression level correlated with overall survival. When we compared the group with overall survival >=1 year (n = 19) to the group who died within 1 year (n = 4; Supplementary Table 8),3 we identified one gene, ACTN4 gene (H50993), as highly expressed in the poor survival group. Then we compared the survival of ACTN4 low-expression group to that of ACTN4 high-expression group (Supplementary Table 9).3 As is shown in Fig. 3Citation , the expression level of ACTN4 gene was a significant prognostic predictor of overall survival.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 3. Kaplan-Meier survival curves for ACTN4 low expression group with the gene expression ratio of ACTN4 <2.0, compared with reference RNA (n = 20) and ACTN4 high expression group with the gene expression ratio of ACTN4> = 2.0, compared with reference RNA (n = 4). P = 0.0002 according to log-rank test.

 
Sequence Verification of cDNAs.
We reverified the sequences of cDNAs on our microarrays for the genes we identified as differentially expressed between sample groups through this study. Of 172 clones sequenced, 164 clones (95.3%) were verified to have the correct insert. Two of 8 clones that had wrong sequence, R18039 and N63941, had genomic sequences, different from accession IDs. Another 3 of those 8 clones, AA456400, AA495846, and H63865, had unknown sequences, different from accession IDs. The last 3 clones, H54093, AA422058, and AA211448, had cDNA sequences different from accession IDs (Fig. 2Citation Citation ; Supplementary Fig. 1 and Supplementary Table 6).3

DISCUSSION

We applied gene expression profiling approach to lung cancer classification. In this study, we used supervised methods to identify genes that correlated with certain biological features of a training cohort of tumors. Then, we attempted to examine the relationship between identified gene expression patterns and sample histologies by two statistical methods, hierarchical clustering analysis and a class prediction model system. A blinded confirmatory set of samples in test cohort was used to confirm our findings.

Three recent studies have analyzed lung cancer gene expression profiles by microarray technology (11, 12, 13) . In two of these studies (11 , 12) , the investigators selected genes for analysis of which the expression was most similar within duplicate experiments yet varied widely among the other tumor samples. Then, using gene expression profiles of the selected genes across their tumor set, they clustered the samples by an unsupervised method to identify potentially novel classes in lung cancers. Each group discovered different candidate lung cancer subtypes within adenocarcinoma. Although the statistical approaches we used were different, several genes we identified as discriminators of histological groups were contained with the set of discriminant genes reported in those two papers. For example, four and a half LIM domain 1 was highly expressed in normal lung, and keratin 5 and bullous pemphigoid antigen 1 were highly expressed in squamous cell carcinoma in our study as was reported by Bhattacharjee et al. (12) Folate receptor, KIAA 1319 protein, and mucin 1 were highly expressed in adenocarcinoma compared with squamous cell carcinoma in our study as was also reported by Garber et al. (11) . Besides these previously identified genes, we identified many genes that had not been reported previously to be differentially expressed in lung cancers. The expression level of these genes or the proteins encoded by these genes may be useful as novel biomarkers.

In our analyses, the large cell carcinoma group was extremely difficult to cluster into one group by its gene expression profile. When we attempted to identify the genes that correlated with large cell carcinomas, no or only a few genes were identified as candidates. Also, when we attempted to examine the relationships between gene expression patterns and sample histologies by using hierarchical clustering analysis and a class prediction model, large cell carcinomas were outliers. For example, large cell carcinoma samples 5 and 19 had gene expression profiles quite similar to adenocarcinoma and squamous cell carcinoma, respectively. These data suggest that poorly differentiated tumors by light microscopic evaluation may be more related to tumors from either of these two groups than to each other. In the article from Garber et al. (11) , they had four pure large cell carcinomas in their clustering analysis across 73 lung samples. Of those four large cell carcinomas, three large cell lung carcinomas clustered with adenocarcinomas to make a large cell carcinoma cluster, and the remaining large cell carcinoma clustered with adenocarcinomas in one of the adenocarcinoma subgroups, adeno group 3 cluster. The numbers of large cell lung cancers analyzed by microarray technology is still too small to make clear conclusions as to whether large cell carcinoma is a genetically distinct group or not, whereas our data suggest not. Greater numbers of large cell carcinomas with complete clinical information need to be analyzed to answer this question.

Our study is unique in that we used a blinded test cohort to confirm our prediction model system. In this test cohort, we had 3 kinds of metastatic tumors that were not in our training cohort, a colon adenocarcinoma (sample 48), an adrenal tumor (sample 75), and a hepatocellular carcinoma (sample 76). Our prediction model accurately predicted 2 of these, the adrenal tumor and hepatocellular carcinomas, as nonprimary lung tumors. However, our prediction model predicted the colon adenocarcinoma as a primary lung adenocarcinoma. Interestingly, this tumor arose in the lung of a patient cured a decade earlier of lymphoma, and was initially diagnosed as a lung cancer. After its resection, however, the patient developed widespread metastatic disease, including a lesion in the colon, and was clinically felt to represent a metastatic colon cancer, although this is not clinically indisputable.

We also had 2 tumors from patients who had received preoperative therapy in our test cohort. Both of them had radiation and chemotherapy before surgery, and sample 34 did not respond significantly to this therapy, whereas tumor 43 responded well to treatment. Our prediction model classified sample 34 as primary lung carcinoma, but sample 43 was classified as a non-lung primary metastasis. This might reflect the expression changes associated with response to treatment and deserves additional investigation. Much more refinement of the model with a variety of lung and non-lung tumors, and with larger arrays will be required to obtain a more accurate histological prediction model.

Whereas an interesting proof of principle, the potential clinical benefit of this technology will lie in its ability to predict the biological and clinical behavior of tumors. To identify such patterns, a supervised statistical method is required. In our study, we found the expression level of a novel marker ACTN4 was a significant prognostic predictor in these lung cancer patients. The ACTN4 gene product was originally identified through immunoscreening of monoclonal antibodies reactive with proteins up-regulated upon enhanced cell movement (25) . Recently, Beer et al. (13) identified sets of genes of which the expression profiles were correlated with survival of stage I patients with lung adenocarcinomas. They chose these genes by leave-one-out and training-testing cross-validation methods, and used their model to predict survival. Unfortunately, the gene we identified as most correlated with survival, ACTN4, was not included in their list of 4966 analyzed genes. Therefore, we cannot determine the significance of ACTN4 in their data set. Although statistically significant, because our sample size was small and had only a short follow-up interval, the significance of ACTN4 expression in lung cancer survival needs to be reconfirmed by a larger cohort with a longer follow-up interval.

There has been a concern about potential errors in cDNA clones for microarray production (19) . To increase our confidence in our findings, we sequenced every cDNA clone identified as important for each of the classes in our study. Eight clones of 172 (4.7%) showed incorrect sequences. This error rate is much lower than other studies reported previously and suggests that sequence verification is indispensable for verifying microarray data. Unfortunately, the data generated by the Beer et al. (13) and Bhattacharjee et al. (12) studies cannot be confirmed in this way because of limitations in proprietary oligo array technology.

Our results thus confirm that gene expression profiling may be an efficient tool for classifying lung tumors into biologically important and/or prognostic groups, and identifying genes associated with those distinctions. We demonstrate that these classifications can successfully classify blinded test cohorts of tumors. Using a small set of tumors and only limited clinical information, we successfully identified sets of genes distinguishing cancer from normal lung tissue, lung primary from lung metastasis, and with the known light microscopic histological groups. Our data also provide evidence that large cell carcinomas may not be a genetically distinct group, but may often represent tumors more closely related to adenocarcinomas or squamous cell carcinomas than to each other. In a few cases in our study, genetic predictions did not agree with the findings of traditional light microscopic evaluation and identified misclassifications in the original pathology reports. An unsuspected association of ACTN4 expression with survival was also identified and should be investigated additionally in larger numbers of patients. It is hoped that in the future, genetic classification will indicate other novel features of these tumors that were previously undetectable by standard procedures. The ability of our classification to correctly identify blinded tumor samples suggests that the patterns we and others are observing may have real biological significance.

ACKNOWLEDGMENTS

We thank Mark McQuain, Melanie Robinson, Vicky Amann, and Dr. William Grady at the Vanderbilt University Medical Center, and Drs. Adi F. Gazdar, Shinichi Toyooka, and Kiyomi O. Toyooka at University of Texas Southwestern Medical Center for helpful support and thoughtful suggestions.

FOOTNOTES

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Supported by Lung Cancer Special Program of Research Excellence P50CA90949, P50CA70907, Mathers Foundation, and the Robert A. and Helen C. Kleberg Foundation.

1 To whom requests for reprints should be addressed, at Division of Hematology and Oncology, Vanderbilt-Ingram Cancer Center, 685 Preston Research Building, Nashville, TN 37232-6838. Phone: (615) 936-3321; Fax: (615) 936-3322; E-mail: d.carbone{at}vanderbilt.edu Back

2 The abbreviations used are: NSCLC, non-small cell lung cancer; WFCCM, Weighted Flexible Compound Covariate Method; SAM, Significance Analysis of Microarrays; ACTN4, {alpha}-actinin-4. Back

3 Supplemental figures and tables are available on our website, http://array.mc.vanderbilt.edu/supplemental and vicc.org/biostatistics/yamagata.ccr./. Back

4 Internet address: http://array.mc.vanderbilt.edu/. Back

5 Internet address: http://www.ncbi.nllm.nih.gov/blast/http://www.ncbi.nlm.nih.gov/BLAST/. Back

Received 1/14/03; revised 6/29/03; accepted 7/ 3/03.

REFERENCES

  1. Schiller J. H. Current standards of care in small-cell and non-small-cell lung cancer. Oncology, 61 (Suppl. 1): 3-13, 2001.
  2. The World Health Organization histological typing of lung tumours. Ed. 2. Am. J. Clin. Pathol., 77: 123-136, 1982.[Medline]
  3. Schena M., Shalon D., Davis R. W., Brown P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science (Wash. DC), 270: 467-470, 1995.[Abstract/Free Full Text]
  4. Golub T. R., Slonim D. K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J. P., Coller H., Loh M. L., Downing J. R., Caligiuri M. A., Bloomfield C. D., Lander E. S. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science (Wash. DC), 286: 531-537, 1999.[Abstract/Free Full Text]
  5. Alizadeh A. A., Eisen M. B., Davis R. E., Ma C., Lossos I. S., Rosenwald A., Boldrick J. C., Sabet H., Tran T., Yu X., Powell J. I., Yang L., Marti G. E., Moore T., Hudson J., Jr., Lu L., Lewis D. B., Tibshirani R., Sherlock G., Chan W. C., Greiner T. C., Weisenburger D. D., Armitage J. O., Warnke R., Staudt L. M., et al Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature (Lond.), 403: 503-511, 2000.[CrossRef][Medline]
  6. Alon U., Barkai N., Notterman D. A., Gish K., Ybarra S., Mack D., Levine A. J. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96: 6745-6750, 1999.[Abstract/Free Full Text]
  7. Bittner M., Meltzer P., Chen Y., Jiang Y., Seftor E., Hendrix M., Radmacher M., Simon R., Yakhini Z., Ben-Dor A., Sampas N., Dougherty E., Wang E., Marincola F., Gooden C., Lueders J., Glatfelter A., Pollock P., Carpten J., Gillanders E., Leja D., Dietrich K., Beaudry C., Berens M., Alberts D., Sondak V. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature (Lond.), 406: 536-540, 2000.[CrossRef][Medline]
  8. Welsh J. B., Zarrinkar P. P., Sapinoso L. M., Kern S. G., Behling C. A., Monk B. J., Lockhart D. J., Burger R. A., Hampton G. M. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc. Natl. Acad. Sci. USA, 98: 1176-1181, 2001.[Abstract/Free Full Text]
  9. Perou C. M., Sorlie T., Eisen M. B., van de Rijn M., Jeffrey S. S., Rees C. A., Pollack J. R., Ross D. T., Johnsen H., Akslen L. A., Fluge O., Pergamenschikov A., Williams C., Zhu S. X., Lonning P. E., Borresen-Dale A. L., Brown P. O., Botstein D. Molecular portraits of human breast tumours. Nature (Lond.), 406: 747-752, 2000.[CrossRef][Medline]
  10. Hedenfalk I., Duggan D., Chen Y., Radmacher M., Bittner M., Simon R., Meltzer P., Gusterson B., Esteller M., Kallioniemi O. P., Wilfond B., Borg A., Trent J. Gene-expression profiles in hereditary breast cancer. N. Engl. J. Med., 344: 539-548, 2001.[Abstract/Free Full Text]
  11. Garber M. E., Troyanskaya O. G., Schluens K., Petersen S., Thaesler Z., Pacyna-Gengelbach M., van de Rijn M., Rosen G. D., Perou C. M., Whyte R. I., Altman R. B., Brown P. O., Botstein D., Petersen I. Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl. Acad. Sci. USA, 98: 13784-13789, 2001.[Abstract/Free Full Text]
  12. Bhattacharjee A., Richards W. G., Staunton J., Li C., Monti S., Vasa P., Ladd C., Beheshti J., Bueno R., Gillette M., Loda M., Weber G., Mark E. J., Lander E. S., Wong W., Johnson B. E., Golub T. R., Sugarbaker D. J., Meyerson M. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. USA, 98: 13790-13795, 2001.[Abstract/Free Full Text]
  13. Beer D. G., Kardia S. L., Huang C. C., Giordano T. J., Levin A. M., Misek D. E., Lin L., Chen G., Gharib T. G., Thomas D. G., Lizyness M. L., Kuick R., Hayasaka S., Taylor J. M., Iannettoni M. D., Orringer M. B., Hanash S. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med., 8: 816-824, 2002.[Medline]
  14. Hegde P., Qi R., Abernathy K., Gay C., Dharap S., Gaspard R., Hughes J. E., Snesrud E., Lee N., Quackenbush J. A concise guide to cDNA microarray analysis. Biotechniques, 29: 548-562, 2000.[Medline]
  15. Tukey J. W. Tightening the clinical trial. Control Clin. Trials, 14: 266-285, 1993.[CrossRef][Medline]
  16. Shyr Y., Kim K. M. Weighted flexible compound covariate method for classifying microarray data Berrar D. eds. . A Practical Approach to Microarray Data Analysis, 186-200, Kluwer Academic Publishers Norwell, MA 2003.
  17. Everitt B. S. . Cluster Analysis, Halsted Press New York 1993.
  18. Tusher V. G., Tibshirani R., Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA, 98: 5116-5121, 2001.[Abstract/Free Full Text]
  19. Knight J. When the chips are down. Nature (Lond.), 410: 860-861, 2001.[CrossRef][Medline]
  20. Ben-Dor A., Friedman N., Yakhini Z. . Scoring Genes for Relevance, Agilent Labs, Agilent Technologies Palo Alto, CA 2000.
  21. Eisen M. B., Spellman P. T., Brown P. O., Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 95: 14863-14868, 1998.[Abstract/Free Full Text]
  22. Franklin W. A., Waintrub M., Edwards D., Christensen K., Prendegrast P., Woods J., Bunn P. A., Kolhouse J. F. New anti-lung-cancer antibody cluster 12 reacts with human folate receptors present on adenocarcinoma. Int. J. Cancer, 8 (Suppl.): 89-95, 1994.
  23. Seregni E., Botti C., Lombardo C., Cantoni A., Bogni A., Cataldo I., Bombardieri E. Pattern of mucin gene expression in normal and neoplastic lung tissues. Anticancer Res., 16: 2209-2213, 1996.[Medline]
  24. Blobel G. A., Moll R., Franke W. W., Vogt-Moykopf I. Cytokeratins in normal lung and lung carcinomas. I. Adenocarcinomas, squamous cell carcinomas and cultured cell lines. Virchows Arch. B Cell Pathol. Incl. Mol. Pathol., 45: 407-429, 1984.[Medline]
  25. Honda K., Yamada T., Endo R., Ino Y., Gotoh M., Tsuda H., Yamada Y., Chiba H., Hirohashi S. Actinin-4, a novel actin-bundling protein associated with cell motility and cancer invasion. J. Cell Biol., 140: 1383-1393, 1998.[Abstract/Free Full Text]



This article has been cited by other articles:


Home page
Mol. Cell. Biol.Home page
H. Nakatsuji, N. Nishimura, R. Yamamura, H.-o. Kanayama, and T. Sasaki
Involvement of Actinin-4 in the Recruitment of JRAB/MICAL-L2 to Cell-Cell Junctions and the Formation of Functional Tight Junctions
Mol. Cell. Biol., May 15, 2008; 28(10): 3324 - 3335.
[Abstract] [Full Text] [PDF]


Home page
J. Mol. Diagn.Home page
E. Retamales, L. Rodriguez, L. Guzman, F. Aguayo, M. Palma, C. Backhouse, J. Argandona, E. Riquelme, and A. Corvalan
Analytical Detection of Immunoglobulin Heavy Chain Gene Rearrangements in Gastric Lymphoid Infiltrates by Peak Area Analysis of the Melting Curve in the LightCycler System
J. Mol. Diagn., July 1, 2007; 9(3): 351 - 357.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
T. Hara, K. Honda, M. Shitashige, M. Ono, H. Matsuyama, K. Naito, S. Hirohashi, and T. Yamada
Mass Spectrometry Analysis of the Native Protein Complex Containing Actinin-4 in Prostate Cancer Cells
Mol. Cell. Proteomics, March 1, 2007; 6(3): 479 - 491.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
D. N. Hayes, S. Monti, G. Parmigiani, C. B. Gilks, K. Naoki, A. Bhattacharjee, M. A. Socinski, C. Perou, and M. Meyerson
Gene Expression Profiling Reveals Reproducible Human Lung Adenocarcinoma Subtypes in Multiple Independent Patient Cohorts
J. Clin. Oncol., November 1, 2006; 24(31): 5079 - 5090.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
H. B. Acuff, M. Sinnamon, B. Fingleton, B. Boone, S. E. Levy, X. Chen, A. Pozzi, D. P. Carbone, D. R. Schwartz, K. Moin, et al.
Analysis of Host- and Tumor-Derived Proteinases Using a Custom Dual Species Microarray Reveals a Protective Role for Stromal Matrix Metalloproteinase-12 in Non-Small Cell Lung Cancer
Cancer Res., August 15, 2006; 66(16): 7968 - 7975.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
S. M. Jamshedur Rahman, Y. Shyr, P. B. Yildiz, A. L. Gonzalez, H. Li, X. Zhang, P. Chaurand, K. Yanagisawa, B. S. Slovis, R. F. Miller, et al.
Proteomic Patterns of Preinvasive Bronchial Lesions
Am. J. Respir. Crit. Care Med., December 15, 2005; 172(12): 1556 - 1562.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
Y. Hayashida, K. Honda, M. Idogawa, Y. Ino, M. Ono, A. Tsuchida, T. Aoki, S. Hirohashi, and T. Yamada
E-Cadherin Regulates the Association between {beta}-Catenin and Actinin-4
Cancer Res., October 1, 2005; 65(19): 8836 - 8845.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Soc. Nephrol.Home page
B. J. Xu, Y. Shyr, X. Liang, L.-j. Ma, E. M. Donnert, J. D. Roberts, X. Zhang, V. Kon, N. J. Brown, R. M. Caprioli, et al.
Proteomic Patterns and Prediction of Glomerulosclerosis and Its Mechanisms
J. Am. Soc. Nephrol., October 1, 2005; 16(10): 2967 - 2975.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
M. Meyerson and D. Carbone
Genomic and Proteomic Profiling of Lung Cancers: Lung Cancer Classification in the Age of Targeted Therapy
J. Clin. Oncol., May 10, 2005; 23(14): 3219 - 3226.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
S. G. Talbot, C. Estilo, E. Maghami, I. S. Sarkaria, D. K. Pham, P. O-charoenrat, N. D. Socci, I. Ngai, D. Carlson, R. Ghossein, et al.
Gene Expression Profiling Allows Distinction between Primary and Metastatic Squamous Cell Carcinomas in the Lung
Cancer Res., April 15, 2005; 65(8): 3063 - 3071.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
B. Ferraro, G. Bepler, S. Sharma, A. Cantor, and E. B. Haura
EGR1 Predicts PTEN and Survival in Patients With Non-Small-Cell Lung Cancer
J. Clin. Oncol., March 20, 2005; 23(9): 1921 - 1926.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Cell Mol. Bio.Home page
C. A. Granville and P. A. Dennis
An Overview of Lung Cancer Genomics and Proteomics
Am. J. Respir. Cell Mol. Biol., March 1, 2005; 32(3): 169 - 176.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Yamagata, N.
Right arrow Articles by Carbone, D. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yamagata, N.
Right arrow Articles by Carbone, D. P.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Cancer Research Clinical Cancer Research
Cancer Epidemiology Biomarkers & Prevention Molecular Cancer Therapeutics
Molecular Cancer Research Cancer Prevention Research
Cancer Prevention Journals Portal Cancer Reviews Online
Annual Meeting Education Book Cell Growth & Differentiation