Purpose: Genetic variation may influence chemotherapy response and overall survival in cancer patients.
Experimental design: We conducted a genome-wide scan in 535 advanced-stage non–small cell lung cancer (NSCLC) patients from two independent cohorts (307 from Nanjing and 228 from Beijing). A replication was carried out on an independent cohort of 340 patients from Southeastern China followed by a second validation on 409 patients from the Massachusetts General Hospital (Boston, MA).
Results: Consistent associations with NSCLC survival were identified for five single-nucleotide polymorphisms (SNP) in Chinese populations with P values ranging from 3.63 × 10−5 to 4.19 × 10−7 in the additive genetic model. The minor allele of three SNPs (rs7629386 at 3p22.1, rs969088 at 5p14.1, and rs3850370 at 14q24.3) were associated with worse NSCLC survival while 2 (rs41997 at 7q31.31 and rs12000445 at 9p21.3) were associated with better NSCLC survival. In addition, rs7629386 at 3p22.1 (CTNNB1) and rs3850370 at 14q24.3 (SNW1-ALKBH1-NRXN3) were further replicated in the Caucasian population.
Conclusion: In this three-stage genome-wide association studies, we identified five SNPs as markers for survival of advanced-stage NSCLC patients treated with first-line platinum-based chemotherapy in Chinese Han populations. Two of these SNPs, rs7629386 and rs3850370, could also be markers for survival among Caucasian patients. Clin Cancer Res; 18(19); 5507–14. ©2012 AACR.
Chemotherapy with platinum-based regimens is the standard of care for advanced-stage non–small cell lung cancer (NSCLC). However, reliable genetic variations for the prediction of response and overall survival to platinum-based therapy are still not well established. In this three-stage genome-wide association study, we identified five single-nucleotide polymorphisms as markers for survival of advanced-stage NSCLC patients treated with first-line platinum-based chemotherapy in Chinese Han populations. Two of these SNPs, rs7629386 and rs3850370, could also be markers for survival among Caucasian patients.
Lung cancer, predominantly non–small cell lung cancer (NSCLC), is the leading cause of cancer-related deaths worldwide; 5-year survival rates are only 15.1% in China (1). The current standard treatment of patients with early stage NSCLC is surgical resection, with or without adjuvant chemotherapy (2, 3). However, most patients are diagnosed at advanced stage, where platinum-based chemotherapy is the mainstay of treatment. Response to therapy varies considerably between patients (4).
The traditional tumor–node–metastasis staging system is not useful for predicting response to therapy, and molecular biomarkers can add value in this setting (5, 6). It has been suggested that the discovery and application of molecular biomarkers to incorporate with traditional clinical prognostic factors could guide individualized treatment and improve the outcomes of cancer patients (7). Many studies have been conducted to explore single-nucleotide polymorphisms (SNP) in candidate genes and overall survival in NSCLC, including studies from our groups (8–12). Genome-wide association studies (GWAS), which make no assumptions about the genomic location of the causal variants, are expected to complement the candidate gene approach.
Here, we conducted a GWAS scan by genotyping 906,703 SNPs for association with survival in 535 Han Chinese with advanced NSCLC. Fast track replication was conducted in an independent NSCLC cohort from Southeastern China followed by further validation in NSCLC patients from the Massachusetts General Hospital (Boston, MA).
Materials and Methods
The discovery phase of the study included 2 groups of advanced NSCLC patients from Nanjing and Beijing described previously in our GWAS study on lung cancer susceptibility (13). To construct a relatively homogenous population with same treatment, our current study was restricted to stage III or IV NSCLC patients treated with first-line platinum-based chemotherapy without surgery. Specifically, 307 patients recruited at the Affiliated Cancer Hospital and the First Affiliated Hospital of Nanjing Medical University (Nanjing Study), and 228 patients from the Cancer Hospital, Chinese Academy of Medical Sciences (Beijing Study), were included in the discovery set. The first replication included 340 NSCLC patients recruited from Nanjing Thoracic Hospital and Shanghai Chest Hospital (Southeastern China; 14), and the second replication included 409 NSCLC patients from the Massachusetts General Hospital (Boston, MA; Harvard cohort; 15). Subjects in the discovery phase and the first replication phase were all unrelated Han Chinese, and those in the second replication set were Caucasian. All patients had histopathologically or cytologically confirmed NSCLC, which was reviewed by at least 2 local pathologists. Informed consent was obtained from each subject at the time of recruitment, and this study was approved by the Institutional Review Boards of each participating institution.
Quality control in GWAS
Before the survival analysis, we conducted systematic quality control on the raw genotyping data to filter both unqualified samples and SNPs, as described previously (13). SNPs were excluded if: (i) SNPs were not mapped on autosomal chromosomes; (ii) SNPs had a call rate <95%; or (iii) SNPs had minor allele frequency (MAF) <0.05. Samples with low call rates (<95%), ambiguous gender, familial relationships (PI_HAT > 0.25), outliers in the principal component analysis and extreme heterozygote rate (>6 SD from nearest neighbor) were removed. Finally, 303 cases were used for the Nanjing study, and 225 cases for the Beijing study with 576,351 overlapped SNPs.
SNPs selection and genotyping
Genome-wide association analysis was conducted using the additive model by multivariate Cox regression analysis with adjustment for age, gender, smoking status, histology, and stage in each of the discovery studies. Meta-analysis on Nanjing and Beijing discovery samples were then carried out based on fixed or random models. For fast track replication, we selected 33 top SNPs that had (i) P < 1 × 10−4 for all GWAS samples and (ii) a consistent association at P < 0.05 in both the Nanjing study and the Beijing study (Supplementary Table S1). Twelve SNPs were used in the first validation, whereas 21 SNPs were excluded because of high-linkage disequilibrium (LD) with the 12 selected SNPs. Five SNPs that were found to be associated with overall survival at P values lower than 0.05 (in any genetic model) and had the same association direction as the GWAS scan were further genotyped in the second validation set. Genotyping for the validation populations was done using TaqMan allelic discrimination assay on an ABI 7900 system (Applied Biosystems), except one (rs3850370) by SNP chip (Illumina 610k Quad, Call rate: 100%), and one (rs41997) by imputation analysis using the LD information from the 1000 Genomes Project database [by MACH, the estimated squared correlation (r2) between imputed and true underlying genotypes is 1.00, indicating an excellent imputation quality] for the Harvard cohort. The information on primers and probes are available upon request. A total of 10% random samples were reciprocally tested on the TaqMan assay, and the reproducibility was 100%.
Overall survival time was calculated from the time of diagnosis until death or the latest follow-up. We used PLINK 1.07 for general statistical analysis (16). The survival package in R (PLINK plug-in) was used to conduct the analyses of lung cancer-related death. HR and their 95% confidence intervals (CI) were calculated by multivariate Cox regression analyses adjusted for age, gender, histology, stage, and smoking status. The Cochrane Q statistics test was used for the assessment of heterogeneity. The fixed and random effects meta-analyses were used to calculate the pooled HR. If the result of the Q test was not significant, the fixed effects model was chosen. Otherwise, the pooled HR was estimated using the random effects model. The MACH 1.0 software (http://www.sph.umich.edu/csg/abecasis/MACH/index.html) was used to impute untyped SNPs by the LD information from the HapMap phase II database (CHB+JPT was used as reference set, released July 17, 2006; 17). Region plot was generated using an online tool, LocusZoom 1.1 (http://csg.sph.umich.edu/locuszoom/). Analyses were also conducted using SAS version 9.1.3 (SAS Institute) or Stata version 9.2 (StataCorp LP).
The characteristics of the NSCLC patients in the 2 discovery cohorts and the 2 replication cohorts are shown in Table 1. The median survival times (MST) were 14.2, 23.0, 23.8, and 11.9 months, and 206 (68.0%), 101(44.9%), 120 (35.3%), as well as 353 (86.3%) patients died from lung cancer in these cohorts, respectively. Only the Harvard cohort had NSCLC patients other than squamous cell carcinoma and adenocarcinoma. The Harvard cohort also had more stage IV patients, female and ever smokers (Table 1).
P values for the 2 discovery cohorts by meta-analysis are presented in the scatter plot with multiple suggestive associations (P < 10−4; Fig. 1 and Supplementary Table S1). We selected 12 SNPs (21 SNPs excluded) to conduct the validation using an independent NSCLC cohort from Southeastern China. For the 12 SNPs, 5 SNPs at 3p22.1 (rs7629386), 5p14.1 (rs969088), 7q31.31 (rs41997), 9p21.3 (rs12000445), and 14q24.3 (rs3850370) were found to be associated with overall survival in the same direction as the GWAS scan with P values lower than 0.05 (rs41997 and rs12000445 in additive genetic model, while rs7629386, rs969088, and rs3850370 in homozygote comparison; Table 2). For all Chinese samples, the 5 SNPs were associated with NSCLC survival with Pmeta (additive) values ranging from 3.63 × 10−5 to 4.19 × 10−7, with 3 (the minor allele of rs7629386, rs969088, and rs3850370) being associated with worse survival whereas 2 (the minor allele of rs41997 and rs12000445) were associated with improved survival (Table 2).
We further validated these 5 SNPs in Harvard cohort samples. Four SNPs (rs7629386, rs969088, rs12000445, and rs3850370) had same direction of association (Table 3, Supplementary Fig. S1), but only rs7629386 at 3p22.1 (homozygote comparison) and rs3850370 at 14q24.3 (additive model) had significant association with NSCLC survival (Table 3, Supplementary Figs. S3 and S4). For rs969088, the MAF in Caucasians is too low to show a successful replication. The results of stratification analyses on the 5 SNPs (for Chinese samples) are shown in Supplementary Fig. S2. We only observed slightly heterogeneity in the HRs for rs41997 among histologic subtypes (P = 0.050).
In this study, we conducted a 3-stage (4 cohorts) analytical approach to better understand the genetic factors modulating survival in advanced NSCLC patients treated with first-line platinum-based chemotherapy. During the preparation of the manuscript, a GWAS on a Caucasian population was published using a discovery scan of 327 advanced NSCLC patients from the Texas MD Anderson Cancer Center (18). They replicated the association between rs1878022 and NSCLC survival in a Spanish cohort (420 patients, HR = 1.23, P = 0.05) but not in the Mayo Clinic cohort (315 patients, HR = 1.16, P = 0.15), whereas rs10937823 was replicated in the Mayo Clinic cohort (HR = 1.45, P = 0.04) but not in the Spanish cohort (HR = 0.96, P = 0.84) (18). However, rs1878022 and rs10937823 were both nonsignificantly associated with NSCLC death risk in our GWAS scan samples (rs1878022: HR = 1.10, P = 0.204; rs10937823: HR = 0.89, P = 0.336; Supplementary Fig. S5A and S5B).
The SNP rs3850370 at 14q24.3 can modify survival in NSCLC across Chinese and Caucasian populations. rs3850370 is located among 3 genes: Ski-interacting protein (SKIIP, known also as SNW1, upstream 307kb), alkylation repair homolog 1 (ALKBH1, downstream 360kb), and Neurexin 3 (NRXN3, upstream 1210kb; Supplementary Fig. S6E). SNW1 is a nuclear matrix-associated coactivator that may couple vitamin D receptor-mediated transcription and RNA splicing (19). Using a library of endoribonuclease-prepared short interfering RNAs (esiRNA), Kittler and colleagues identified that SNW1 is among the 37 genes required for cell division (20). Recently, it was also shown that SNW1 is required for TGF-β1 induced epithelial–mesenchymal transition and invasiveness in transformed cells (21). To date, 8 human homologues of the Escherichia coli DNA repair enzyme ALKB (ALKBH1-8) have been identified, which are involved in repair of alkylation damage in DNA and RNA (22). Among them, ALKBH3 was reported to be significantly overexpressed in lung adenocarcinoma cells and contributes to NSCLC cell survival (23), although combined treatment modalities with cisplatin- and lentivirus-mediated ALKBH2 downregulation were significantly more potent in inhibiting lung cell growth and inducing apoptosis than monochemotherapy (24). However, little is known for the role of ALKBH1 in NSCLC survival and/or therapy response.
In this study, rs7629386 (3p22.1) was also replicated across Chinese and Caucasian populations for homozygote comparison. rs7629386 is relatively rare in Chinese (MAF = 0.07) but common in Caucasians (MAF = 0.31); therefore, the combined P value for all participants was only 1.24 × 10−2 in the additive model. Many researchers have reported that deletion of 3p22.1 was early and frequent events in lung tumorigenesis (25, 26). rs7629386 is about 274 kb upstream of the β-catenin gene (coded by CTNNB1), a key component of the Wnt signaling pathway (Supplementary Fig. S6A). Somatic mutations and elevated levels of β-catenin have been observed in most common forms of human malignancies (27, 28).
rs969088, rs12000445, and rs41997 were not replicated in the Harvard cohort. The reasons for the discrepancy may be as follows: First, marker SNPs might be different between ethnic populations (because of different allele frequency and LD structure). Second, the demographic and clinic characteristics of Chinese NSCLC patients were different from those in the Harvard cohort (gender, stage, smoking rate, and histologic types) and these factors are survival related. Finally, the relatively small sample size in each participant studies may yield insufficient power. The real causative SNPs need to be identified by fine mapping and comparative analyses on both Chinese and Caucasian populations.
The SNP rs969088 at 5p14.1 is located about 490 kb downstream of Cadherin 9 (CDH9; Supplementary Fig. S6B). CDH9 belongs to the cadherin family that are involved in calcium-dependent cell–cell junctions in the nervous system (29). It was reported that hypermethylation of its family members, CDH1 and CDH13, was associated with better survival of patients with NSCLC (30, 31). rs12000445 is located at 9p21.3, a region frequently deleted in lung cancer (32), about 273 kb downstream of Hu-antigen B (HuB), one of the tumor antigens (Supplementary Fig. S6D). rs41997 is located at a site with suggested tumor suppressor genes (TSG) and frequent loss of heterozygosity in a variety of tumors of epithelial origin (33, 34).Three well known TSGs were close to rs41997 (about 700 to 1100 kb): Cystic fibrosis transmembrane conductance regulator (CFTR), wingless-type mouse mammary tumor virus integration site family, member 2 (WNT2), and suppressor of tumorigenicity 7 (ST7; Supplementary Fig. S6C). Promoter hypermethylation of CFTR was reported as an important prognostic factor in younger patients with NSCLC (35). The Wnt/β-catenin signal pathway has relationships with both lung cancer development and prognosis (36), while ST7 is a highly conserved TSG with reported mutations in breast and colon carcinomas (37).
To further evaluate possible influence of the 5 SNPs on gene expression, we searched publicly available cis–expression quantitative trait loci (eQTL) information from in lymphoblastoid cell lines (38–40). We only found that rs7799229 (in highly LD with rs41997, R2 = 0.90) is closely associated with the expression of N(alpha)-acetyltransferase 38 (NAA38; P = 2.7 × 10−7; 39), a member of the like-Sm protein family, which transiently binds U6 small nuclear RNAs and is involved in the general maturation of RNA in the nucleus (41). We did not obtain information on the genotype–phenotype correlations of the 5 SNPs and gene expression in lung cancer tissues or cell lines (42, 43).
Our study has a number of strengths. First, this is the largest GWAS study to date to investigate overall survival in advanced NSCLC patients. Second, we used a 3-stage study design with 4 independent cohorts. SNPs that are associated with overall survival across all study populations have a high probability of being true-positive findings, thus reducing the need for strict multiple comparison correction. Third, we identified 5 candidate SNPs as marker SNPs for Chinese, suggesting important biologic pathways, and largely improved current notions on NSCLC survival. Several limitations of our study also need to be addressed. Considering the heterogeneity in genetic background and treatment to NSCLC between the Han Chinese and Caucasian populations, the validation power might be limited in replication II. Another limitation of the study is that the effect of SNP rs7629386 might be unstable because of low frequency of homozygote when a HR of 3.92 was observed in Han population but the HR was much lower at 1.50 in Caucasian population. Finally, eQTL analysis did not reveal significant findings. Therefore, further validations incorporating different ethnic background populations, together with the resequencing of the marked region and functional evaluations are warranted.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: T. Wu, D. Lu, D.C. Christiani, D. Lin, Z. Hu, H. Shen
Development of methodology: R.S. Heist, J. Dai, D.C. Christiani, Z. Hu, H. Shen
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): L. Hu, X. Zhao, R.S. Heist, L. Su, B. Han, S. Cao, M. Chu, J. Dong, Y. Shu, L. Xu, Y. Chen, Y. Wang, F. Lu, H. Chen, W. Tan, H. Ma, J. Chen, G. Jin, T. Wu, D. Lu, D.C. Christiani, Z. Hu, H. Shen
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L. Hu, C. Wu, R.S. Heist, Y. Zhao, B. Han, S. Cao, M. Chu, J. Dai, J. Dong, Y. Shu, L. Xu, Y. Chen, Y. Wang, F. Lu, D. Yu, H. Chen, H. Ma, J. Chen, G. Jin, D.C. Christiani, Z. Hu, H. Shen
Writing, review, and/or revision of the manuscript: L. Hu, R.S. Heist, D.C. Christiani, Z. Hu, H. Shen
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): L. Hu, Y. Jiang, D.C. Christiani, Z. Hu, H. Shen
Study supervision: D. Lin, Z. Hu, H. Shen
This work is funded by the National Natural Science Foundation of China (81270044, 30972541, 30901233, 30730080, and 30425001), the China National High-Tech Research and Development Program Grant (2009AA022705), the National Key Basic Research Program Grant (2011CB503805), NIH (CA092824-07, U19-CA148127-01), Jiangsu Key Discipline of Medicine (XK200718), and Priority Academic Program Development of Jiangsu Higher Education Institutions (Public Health and Preventive Medicine).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Received April 12, 2012.
- Revision received June 27, 2012.
- Accepted July 24, 2012.
- ©2012 American Association for Cancer Research.