Abstract
Purpose: Germ line polymorphisms may confer susceptibility to lung cancer in never smokers, but studies in the United States have been limited by the low number of cases seen at single institutions. We hypothesized that we could use the Internet to bolster the accrual of appropriate patients.
Experimental Design: We established an Internet-based protocol to collect blood and information from patients throughout the United States. To illustrate the power of this approach, we used these samples, plus additional cases and age-matched controls from the Memorial Sloan-Kettering Cancer Center (New York, NY) and the Aichi Cancer Center (Nagoya, Japan), to analyze germ line DNA for genetic variants reportedly associated with lung cancer susceptibility. The genotypes for the polymorphisms rs763317 (intron 1) and T790M (exon 20) in the EGFR gene were determined by direct sequencing, and CHRNA3 nicotinic acetylcholine receptor single nucleotide polymorphisms (rs8034191 and rs1051730) were genotyped as part of a pilot genome-wide association study.
Results: We successfully analyzed germ line DNA from 369 cases, including 45 obtained via the Internet, and 342 controls. A germ line EGFR T790M variant was identified in 2 of the 369 cases (0.54%; 95% confidence interval, 0.21-1.29%), and in none of the 292 controls (P = 0.21). No difference was observed in EGFR rs763317 frequency between cases and controls. Similarly, neither CHRNA3 rs8034191 nor rs1051730 were associated with lung cancer risk.
Conclusions: The Internet provides a way to recruit patients throughout the country for minimal risk studies. This approach could be used to facilitate studies of germ line polymorphisms in specific groups of patients with cancer. Clin Cancer Res; 16(2); 755–63
- Non–small cell lung cancer
- EGFR
- never smoker
- genetic susceptibility
- Internet
Translational Relevance
This study examines three genetic variants, reported to be associated with genetic susceptibility, in never smokers with lung cancer from the United States and Japan. We collected germ line DNA from a large cohort of American and Japanese never smokers with or without lung cancer. We used a dedicated Internet-based protocol to bolster accrual from patients throughout the United States. Contrary to previous studies including smokers, no differences were observed in epidermal growth factor receptor rs763317 or CHRNA3 (rs8034191/rs1051730) frequencies between cases and controls. The CHRNA3 loci are more likely to be associated with smoking addiction rather than the development of lung cancer. We did identify a germ line EGFR T790M variant in 2 of 369 cases (0.54%). This study validates the epidermal growth factor receptor T790M as a rare lung cancer risk variant in never smokers and illustrates the usefulness of the Internet in recruiting patients throughout the country for minimal risk studies.
Lung cancer is the leading cause of cancer-related death worldwide. Cigarette smoking causes lung cancer in the majority of patients. However, the disease also arises in patients who are lifelong never smokers, i.e., individuals who smoked less than 100 cigarettes in their lifetime. In the United States, an estimated 10% of lung cancers occur in never smokers (1). Higher percentages of never smokers, up to 30%, have been observed in East Asian countries (2). Compared with other patients with lung cancer, “never smokers” with lung cancer have a unique clinical course (3, 4): they have a better prognosis (4), their tumors are more likely to harbor somatic mutations in the gene encoding the epidermal growth factor receptor (EGFR), and their tumors are more likely to respond to EGFR tyrosine kinase inhibitors (5, 6).
The etiology of lung cancer in never smokers is poorly understood. Second-hand smoke and radon may account for as many as 50% of cases (7, 8). Outdoor air pollution, cooking oil fumes, coal fumes, and asbestos may also contribute as risk factors (9). Given the lack of direct exposure to known carcinogens, never smokers with lung cancer may represent a subgroup for which predisposing genetic factors might be prominent and distinct from those of smokers.
Several reports, including recent genome-wide association studies, have identified germ line genetic variants associated with an increased risk of lung cancer (10–15). Most variants occur in genes that encode key proteins involved in the metabolism of tobacco-related products, including cytochrome P4501A1 (10), glutathione-S-transferase (11), myeloperoxidase (12), or nicotinic acetylcholine receptor (nAchR) subunits (13–15). The influence of cigarette smoking in these studies has been inconsistent (14), in part, because fewer never smokers were studied. Collection of appropriate samples is limited by the relatively small number of never smokers seen in the United States at single institutions.
In the United States, there are 210,000 new cases per year of lung cancer (1, 9). At an incidence of ∼10%, there are annually ∼21,000 cases of lung cancer in never smokers. We hypothesized that we could facilitate the collection of appropriate samples by using an Institutional Review Board (IRB)–approved protocol to collect clinical information and blood from never smokers with non–small cell lung cancer (NSCLC) across the United States, using the Internet for patient recruitment and questionnaire delivery. To illustrate the power of this approach, we used these samples, plus additional samples from sources at two academic institutions, to assess the frequency in never smokers of three germ line variants reported to be associated with genetic susceptibility to lung cancer in never smokers: the allele that leads to the substitution of methionine for threonine at position 790 (T790M) in EGFR (16); a recently reported single nucleotide polymorphism (SNP) in intron 1 of EGFR (rs763317) implicated in female never smokers with lung cancer (17); and the risk alleles rs8034191 and rs1051730 in the nAchR subunits (13–15).
Materials and Methods
Patients
Patient samples from cases were obtained on IRB-approved protocols from three different sources: (a) an Internet-based protocol that collected samples from patients throughout the United States, (b) the Memorial Sloan-Kettering Cancer Center (MSKCC) in New York, NY, and (c) the Aichi Cancer Center in Nagoya, Japan. DNA specimens from age-matched never-smoker controls without lung cancer (controls) were collected under various IRB-approved protocols at the MSKCC (120 cases and 95 controls) or at the Aichi Cancer Center (125 cases and 247 controls; Table 1).
Population characteristics
Internet-based enrollment at MSKCC
A prospective study to collect clinical data and biological specimens from never smokers with lung cancer was approved by the MSKCC IRB and recorded in the ClinicalTrials.gov database (NCT00745160). The study opened in September 2008.10
Eligible patients were (a) >18 y old, (b) had histologically and/or cytologically proven NSCLC (adenocarcinoma, squamous cell carcinoma, and large cell carcinoma; ref. 18), and (c) were never smokers. Patients had to complete a screening questionnaire and give written informed consent for participation. Patients were excluded if they had any previous history of invasive malignancy (other than lung cancer), lived outside of the United States, or were not English speaking (as protocol documents and consents are only written in English).
Patients at MSKCC were allowed to enroll on the same protocol directly in the clinic. A retrospective review of the clinical and pathologic characteristics of these patients, including results of routine genotyping for the most common types of EGFR activating mutations, was done with IRB approval.
Internet-based collection of clinical data and blood
Patients read about the study via an IRB-approved web site11 (Fig. 1) and then provided contact information either through the web site or by sending an e-mail requesting information12 (step 1). A screening kit was directly sent to patients by e-mail or by regular mail. This included a detailed information packet, a screening questionnaire, consent forms, and a letter for release of pathology records (step 2). The questionnaire aimed at determining study eligibility and consisted of a detailed smoking survey, based on the Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance System Survey Questionnaire (19), along with additional questions regarding diagnosis, sex, age, ethnicity, mutational status of the tumor, treatment history, environmental exposure, and history of previous malignancies. A toll-free number was available for questions about the study. Documents were shipped back to study investigators using a prepaid shipping label (step 3).
Internet recruitment process.
Eligible patients were shipped materials for blood collection, including a letter to a local laboratory testing provider, two sodium heparinized 10-mL tubes, biohazard bags, and a prepaid overnight delivery bubble envelope (step 4). For patients who did not have regularly scheduled blood tests, saliva self-collection kits (Oragene OG-250, DNA Genotek) were sent. To ensure patient confidentiality, prior to sending kits to patients, collection materials were labeled with unique identifiers. Files linking patients' names and protocol ID codes were accessible to only one person, who was not directly involved in the processing of the samples or subsequently in the data analysis. Bloods were drawn as part of a routine venipuncture (step 5), returned to study investigators, and stored at −80°C (step 6).
DNA extraction and genotyping of genetic variants
DNA was extracted from blood and saliva specimens using the Puregene DNA blood kit (Qiagen) or standard methods. EGFR intron 1 and exon 20 were analyzed by direct sequencing, using the following primers: ex1-F-5′-AGGGCTGAGAAAGAGAGACA-3′ and ex1-R-5′-TGGGGAGAAAGTTAAAGCTA-3′, and ex20-F-5′-CTCCACAGCC CCAGTGTC-3′ and ex20-R-5′ GGCCAGTGCTGTCTCTAAGG-3′, respectively.
SNPs in the nAchR were genotyped as part of a pilot genome-wide association study. We genotyped 672 DNA samples using the Illumina 610-Quadv1.0 genotyping system. We first removed random samples used to assess the quality of DNA derived from our protocol (see below). Following evaluation of call rate and presence of monomorphs, we removed samples with a <98% call rate per individual, 50% identity-by-descent to another sample in the study, and/or with a gender discordance (X homozygosity). SNPs with >5% missing genotypes and monomorphs were also removed. This left us with 624 individuals genotyped at 582,871 SNPs each. We used principal components analysis, as implemented in the EIGENSOFT software package (20), separately on samples of Asian and European ancestry to remove population outliers and determine significant principal components to correct for residual population stratification. Among the Asian ancestry samples, all nonoutlier samples came from Japan. Three significant principal components were found in the Asian ancestry samples, whereas five significant principal components were found in the European ancestry samples.
Statistical analyses
For EGFR exon 20, categorical variables were compared using Fisher's exact test. Results were considered significant at the 0.05 level. Statistical analyses were done using SPSS, version 17.0.
For the other SNPs, logistic regression was used to assess allele frequency differences between cases and controls as implemented in the software package PLINK (21). For the nAchR SNPs, significant principal components were included as covariates to account for population substructure.
Results
Determination of DNA quality from blood specimens
To determine the feasibility of extracting high-quality DNA from blood stored at room temperature for varying periods of times, we tested blood samples from 10 volunteers processed 24, 48, or 72 hours after venipucture. In all cases, processing of extracted DNA on Human 610-Quadv1.0 arrays (Illumina) resulted in call rates higher than 99.99%.
We allowed blood collection from patients at any time during their treatment course. To ensure that this approach was valid, we showed that SNP discordance was lower than 10−5 pre- and post-chemotherapy in four patients.
Clinical data and blood sample collection
The protocol web site went online on September 15, 2008. As of May 15, 2009, 122 patients expressed interest in the study via the Internet (n = 119) or the phone (n = 3). Patients learned about the program through Internet searches (n = 49), blogs (n = 48), personal doctors (n = 15), or nonprofit organizations' newsletters (n = 10). A significant number of patients contacted us after the web site was highlighted on cancer-related web sites; subsequently, accrual was quite steady. Five patients resided outside the United States and were excluded. Five individuals sent e-mail messages only to share personal experiences. Overall, screening documents were sent to 111 patients.
Preferred initial contact with study investigators was e-mail (n = 89), regular mail (n = 11), and phone call (n = 11). Patients were from 28 different states, with the highest representations from Florida (n = 19) and California (n = 11; Fig. 2).
Geographic locations of patients who requested information over the Internet regarding the study. Blue marks, patients who requested screening documents; yellow marks, patients who sent blood specimens (http://maps.google.fr/maps/ms?hl=fr&ie=UTF8&msa=0&msid=115533600024693418390.00045b5f97b9f61899646&z=1).
Sixty-three (57%) patients returned the screening documents. Eligibility was confirmed for 55 (87%) patients. Main reasons for exclusion were (a) a previous history of invasive cancer (n = 6), and (b) a histologic type other than NSCLC (carcinoid tumor in one case and small cell lung cancer in another case). The majority of patients (n = 52) had a regular blood test, and saliva collection kits were sent to only three patients. As of May 15, 2009, a total of 45 patients returned a blood/saliva specimen. Median time between first e-mail contact and receipt of blood was 33 days (range, 9-200 days).
During the same period, an additional 79 patients were recruited directly in MSKCC clinics. The clinical characteristics of all 124 patients whose biospecimens were collected on this protocol are shown in Table 1. There were 100 women and 24 men. Most were white and had tumors with adenocarcinoma histology (98% of cases). Tumor EGFR mutational status was known for 67 patients and consisted of exon 19 deletions in 41 (61%) cases.
Association analyses
Genotyping for the germ line EGFR T790M variant (C → T at nucleotide 2369 in exon 20) was done on DNA from 661 never smokers, including 369 patients with NSCLC, mostly of the adenocarcinoma subtype (99% of cases), and 292 controls, mostly of Asian ethnicity (83% of controls). Fifty controls from our cohort (45 white and 5 Asian) were not included in these analyses due to insufficient quantity of DNA after processing on SNP arrays. Median age was not significantly different in the two groups (61 and 62 years old; P = 0.822, Fisher's exact test). Results from routine tumor genotyping for EGFR somatic mutations were available for 226 cases, 131 (58%) of whom had an activating EGFR mutation. The T790M variant was identified in 2 (0.54%; 95% confidence interval, 0.21-1.29%) of the 369 patients with NSCLC, and in none of the 292 controls (P = 0.21 at Fisher's exact test).
SNP rs763317 in intron 1 was investigated for association with lung cancer, adjusting for ethnicity. Due to the limited number of individuals of black or Hispanic ancestry, these individuals were excluded from the analysis. No difference was observed in allele frequency between cases and controls (odds ratio, 0.93; 95% confidence interval, 0.70-1.2; P = 0.58). Similar results were found after the removal of 40 males from the data set (odds ratio, 0.94; 95% confidence interval, 0.71-1.2; P = 0.67). After restricting the cases to individuals known to have an EGFR mutant tumor, still no association was observed (odds ratio, 0.94; 95% confidence interval, 0.65-1.4; P = 0.73). Similar negative results were found with dominant and recessive models.
Data for two SNPs in the nAchR subunit gene CHRNA3 on chromosome 15q25 (rs8034191 and rs1051730) were extracted from an ongoing pilot genome-wide association study, including 217 of the cases and 342 of the controls of the cohort. This allowed us to consider population substructure within those two ethnic groups, using other data from SNP arrays. The nAchR SNPs were tested for association with lung cancer in the white and Japanese cohorts separately, adjusting for significant principal components in each cohort. Neither rs8034191 nor rs1051730 was associated with lung cancer risk (Table 2).
Association of SNPs in the nicotinic acetylcholine receptor gene with lung cancer risk in never-smokers
Pedigrees of two patients with germ line EGFR T790M mutation
The first patient with a germ line EGFR T790M mutation was a 66-year-old woman of Asian descent (from India) who presented with a 2 cm ground-glass nodule in the left lower lobe, associated with multiple bilateral subcentimeter ground-glass opacities. The patient's mother had a history of NSCLC (Fig. 3A). Left lower lobectomy revealed mixed adenocarcinoma with acinar and bronchioloalveolar features, harboring a somatic EGFR L858R mutation (Supplementary Fig. S1).
Pedigrees of patients found to have a germ line EGFR T790M mutation. Numbers, ages of family members; arrows, probands; Lung, patients with lung cancer (followed by age at onset); Leuk, leukemia; CLL, chronic lymphoid leukemia; Gastric, gastric cancer.
The second patient was a 58-year-old man of Eastern European ancestry who was diagnosed with NSCLC with bone and liver metastases. Family history was assessed by an expert genetic counselor (M. Robson). The patient had a significant family history for NSCLC; most family members were smokers (Fig. 3B). Core needle biopsy of a liver lesion revealed poorly differentiated acinar and solid adenocarcinoma, which was found to contain the EGFR L858R mutation at routine genotyping (Supplementary Fig. S1). Unfortunately, tumor histology and mutational status could not be determined for any of the family members with NSCLC.
Discussion
The identification of genetic risk variants that may predispose certain patients to disease requires the analysis of germ line DNA from affected cases. In this study, we show that the Internet can provide a secure, confidential, and convenient way to bolster the accrual of target patients from across the country.
Our protocol had a similar design to the Harvard Myeloproliferative Disorders Study. That protocol used the Internet to collect blood from 345 participants over a 1-year period (i.e., 0.41%/mo of the 7,000 newly diagnosed cases of myeloproliferative disorders in the United States; ref. 22). We had hoped to accrue patients at a similar rate. However, our recruitment was significantly lower, as we only recruited 45 patients over an 8-month period (i.e., 0.11%/mo of the 14,000 estimated newly diagnosed cases over the 8-month period). Limiting factors for enrollment may include an older age at onset of lung cancer and differences in socioeconomic characteristics compared with those who develop myeloproliferative diseases (23); these factors ultimately may reduce these patients' familiarity/comfort/access to the Internet. Socioeconomic factors have previously been reported to significantly influence the completion of an Internet-based behavioral study in smokers (24) and may also exist in never smokers (25). Second, disease-free survival in NSCLC is far shorter than in myeloproliferative disorders, limiting the extra time and effort patients are willing to allot to research. This reason was given by several patients who received screening documents but did not return them. Interestingly, the majority of patients who signed the consent form ultimately completed the questionnaire, had their blood drawn, and sent the blood specimens. This success rate suggests that the study participation was not overly complicated and that a more significant limitation of the study was awareness. Thus, in the future, we will expand efforts to connect with individuals appropriate to this study, through patients' groups, development of collaboration with other institutions, and presentation of the study at oncology meetings.
Although somatic EGFR T790M mutations are common in patients with acquired resistance to EGFR inhibitors (16), germ line EGFR T790M mutations are rare. This variant was initially reported in 2005 in a family of European descent, in which six family members in three generations had lung cancer (26). We found the germ line EGFR T790M variant in 2 of 369 cases (0.54%) of never smokers with NSCLC; both patients had family histories significant for lung cancer. In a separate study, no germ line mutation was identified in a series of 237 individuals with three or more first-degree relatives with lung cancer and 32 patients with bronchioloalveolar carcinoma (27). One germ line EGFR T790M mutation was identified in a cohort of 240 patients with previously untreated lung adenocarcinoma (28). Including our two patients, a total of five patients have now been reported to have the germ line EGFR T790M variant (Table 3). In these patients, lung cancer was diagnosed from ages 50 to 72 years, and was metastatic to the lung or other locations at the time of diagnosis. At least three patients were never smokers. Histology was adenocarcinoma and/or bronchioloalveolar carcinoma in all cases. EGFR was genotyped in all five germ line EGFR T790M cases. Somatic EGFR-activating mutations, which are known to be more frequent in never smokers with lung adenocarcinoma (6), were identified in four of the five patients. Collectively, these data indicate that a germ line T790M alteration is a genetic risk variant for lung cancer in never smokers. However, tumors develop only after a relatively long latency, suggesting that other genetic alterations, such as additional EGFR mutations, may collaborate with T790M (29).
Lung cancer patients with germ line T790M mutations
We were unable to confirm a reported association between rs763317 in the EGFR gene and lung cancer (17). In the previous report, female never smokers with lung adenocarcinoma heterozygous at the locus had a 1.2-fold increased risk of lung cancer relative to individuals homozygous for the G allele. Individuals homozygous for the A allele were 3.6 times more likely to develop lung adenocarcinoma. Based on this effect size and the frequency of this SNP in our control population, we estimated 82% power to detect this association in our Asian cohort, and 96% power to detect the association in our white cohort. Therefore, our failure to replicate this association was not due to low power. One possible explanation is that this association was a false positive. Alternatively, the initial report could have overestimated the effect size at this locus, due to the “winner's curse”, and therefore, our power to replicate the association was lower than we calculated.
We were also unable to replicate the previously reported association between SNPs rs8034191 and rs1051730 in CHRNA3 and lung cancer risk (13–15). Unlike EGFR rs763317, these associations are much more modest. Moreover, the minor allele frequency in Asian populations at these SNPs is much lower than in the European populations in which the association was first discovered. Our power to detect an association in our cohort, therefore, was limited. However, as it has been suggested that these SNPs affect smoking behavior as well as lung cancer risk, and as we focused on never smokers with lung cancers, we cannot discount the possibility that the SNP only has an effect in smokers (or influences smoking behavior) and is truly not associated with lung cancer in never smokers.
To conclude, this study shows the feasibility of bolstering accrual for germ line predisposition studies in never smokers with lung cancer by collecting clinical information and blood sample specimens from appropriate patients throughout the nation using an Internet-based protocol. We plan in future studies to conduct genome-wide association studies with more cases and controls to identify genetic risk variants in this disease population.
Disclosure of Potential Conflicts of Interest
W. Pao: consultant/advisory board, MolecularMD. W. Pao and V. Miller: EGFR T790M patent, MolecularMD. The other authors disclosed no potential conflicts of interest.
Acknowledgments
We thank R. Levine for protocol assistance/reviewing the manuscript; K. Liao for processing specimens on the Illumina platform; T. Aliff for referring patients; H. West for discussing the study on his lung cancer blog; and S. Mantel for discussing this study in the monthly newsletter of Joan's Legacy: Uniting Against Lung Cancer. We are grateful to all the patients who participated.
Grant Support: HOPP Lung Cancer Research Fund (W. Pao), the Rosalind Warren Memorial Fund (W. Pao), the MSKCC Geoffrey Beene Cancer Research Center (W. Pao), the NIH/National Cancer Institute (R01-CA121210; W. Pao), the Society of MSKCC (I. Orlow), Steps for Breath (I. Orlow), and the Labrecque Fundation (I. Orlow). Services provided by the Genomics Core Facility were partially supported by a National Cancer Institute CCSG award (P30-CA008748). N. Girard is a recipient of travel grants from the College des Professeurs de Pneumologie/AstraZeneca, the National Federation of French Comprehensive Cancer Centers/Fondation de France, and the Philippe Foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Footnotes
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
↵10The current web site address is: http://www.vicc.org/neversmokers/.
↵12neversmokerswithlungcancer{at}mskcc.org
- Received September 6, 2009.
- Revision received October 25, 2009.
- Accepted November 8, 2009.