Abstract
In recent years, the utilization of genome-wide association study (GWAS) has proved to be a beneficial method to identify novel common genetic variations not only for disease susceptibility but also for drug efficacy and drug-induced toxicity, creating a field of pharmacogenomics studies. In addition, the findings from GWAS also generate new biologic hypotheses that could improve the understanding of pathophysiology for disease or the mechanism of drug-induced toxicity. This review highlights the implications of GWAS that have been published to date and discusses the successes as well as challenges of using GWAS in cancer pharmacogenomics. The aim of pharmacogenomics is to realize the vision of personalized medicine; it is hoped that through GWAS, novel common genetic variations could be identified to predict clinical outcome and/or toxicity in cancer therapies that subsequently could be implemented to improve the quality of lives of patients with cancer. Nevertheless, given the complexity of cancer therapies, underpowered studies, and large heterogeneity of study designs, collaborative efforts are needed to validate these findings and overcome the limitations of GWA studies before clinical implementation.
See all articles in this CCR Focus section, “Progress in Pharmacodynamic Endpoints.”
Clin Cancer Res; 20(10); 2541–52. ©2014 AACR.
Introduction
Pharmacogenomics is the study to uncover the role of human genetic variations that affect drug efficacy and toxicity (Fig. 1). Genetic variations that include common genetic variants, represented mostly by the single-nucleotide polymorphism (SNP), and rare variants in genes that are involved in drug pharmacokinetics and pharmacodynamics could explain part of this inter-individual variability. Genetic polymorphisms in genes encoding drug metabolism and drug transporter affect drug availability at the target site (drug pharmacokinetics), whereas variants in drug target proteins, such as receptors, enzymes, and intracellular signaling proteins, affect a patient's sensitivity to a drug (drug pharmacodynamics). In recent years, great efforts have been made to identify genetic variations associated with drug-induced phenotypes, with two of the reputable examples being the association of TPMT variants with 6-mercaptopurine–induced myelosuppression for the treatment of pediatric acute lymphoblastic leukemia and the association of UGT1A1 variants with camptothecin-related neutropenia and diarrhea for the treatment of colorectal and lung cancers. Both of these genes were identified through a candidate gene approach. The U.S. Food and Drug Administration (FDA) has revised the drug labels of 6-mercaptopurine and camptothecin to include genotypes of TPMT and UGT1A1 as one of the risk factors for toxicity and stated the genotypes of these genes would be helpful to predict the occurrence of severe adverse events before treatment (1–5).
Promise of pharmacogenomics for the realization of personalized medicine. Patients with similar diagnoses are prescribed the same treatment. The identification of genetic variants associated with drug response, which include drug efficacy and toxicity, could help to predict and reduce the occurrence of events before drug treatment.
During the past decade, pharmacogenomics studies have been mostly carried out through candidate gene approaches in which genes are selected a priori based on their relevance to the drug's mechanism of action. These specific genes may be involved in drug metabolisms, transports, or toxicities that mostly affect drug pharmacokinetics. In the context of adverse drug reactions, besides these genes encoding drug-related proteins, other genes involved in immune-mediated responses may also be relevant, such as human leukocyte antigens and their association with drug hypersensitivity. The establishment of the HapMap database, development of tagSNP selection algorithms, and improvements in genotyping technology have made the concept of genome-wide association studies a reality for linking specific genetic variants across the entire genome with the phenotype of interest (e.g., human disease and individual's traits; Fig. 2). Genome-wide association studies (GWAS) have successfully identified germline variants that are associated with the genetic susceptibility of many common diseases (6). This approach has recently become one of the most beneficial methods in pharmacogenomics research. Unlike the candidate gene approach, GWAS provides a hypothesis-free approach that facilitates the possibility of identifying genetic variants in novel genes that might be involved in both drug pharmacokinetics and pharmacodynamics as well as mechanisms underlying gene–phenotype interactions. As most of the reported candidate-gene approach studies focus only on genes that influence drug pharmacokinetics, GWAS might have the capacity to discover markers that will have a direct or indirect effect on drug pharmacodynamics and will expand our understanding in the interpretation of inter-individual variability of drug response. The pharmacogenomics database pharmGKB (7) is a very useful and important resource that provides substantial information to evaluate the level of evidence of genetic variants identified from GWAS and candidate gene approaches in pharmacogenomics studies up to the present.
Summary workflow of GWAS. GWAS starts with the determination of phenotypes. In pharmacogenomics studies, cases are often the patients who do not respond or who develop severe adverse reactions, whereas controls are patients who respond to the treatment or who do not develop any adverse events after exposure to drug(s) treatment. All the samples were genotypes with chips that contained up to hundreds of thousands of SNPs. Quality control (QC) is a crucial step to ensure the association studies were performed with a good-quality sample and SNP set. Sample quality control usually includes (1) sample quality to exclude poorly genotyped samples, (2) identity-by-state analysis to exclude close relatedness samples, and (3) principal component analysis to evaluate population stratification of the sample sets to obtain a homogeneous sample set before performing the association study. SNPs will be excluded if (1) they are of low genotype quality, (2) if SNPs deviated from normal distribution by evaluating Hardy–Weinberg equilibrium in control samples, and (3) if they contain nonpolymorphic SNPs (minor allele frequency = 0). To evaluate the association distribution, quantile–quantile plots (Q–Q plot) of observed P value versus expected P value and genomic inflation factor (λ value) were evaluated to eliminate the possibility of population substructure. Manhattan plots of P value (−log10) versus chromosome loci were utilized to depict an overview of the GWAS, with each dot representing a SNP and each color representing a chromosome. The post-GWAS included (1) a meta-analysis that combined multiple studies to identify significantly associated SNPs, and (2) functional analysis. Two of the most common functional analyses of the identified variants are (A) electrophoretic mobility shift assay (EMSA) to check the existence of proteins, mainly transcription factors, binding to SNP-contained DNA fragments and (B) luciferase reporter assay (comparison of relative luciferase activity) to assess the associated SNPs that could affect differential gene expression (as shown in figure). (3) Other analyses, including gene-based analysis, pathway analysis, polygenic risk estimation, SNP–SNP interaction, SNP–environment interaction, etc., could be carried out after GWAS.
Successful Pharmacogenomics Studies Using GWAS
GWAS not only successfully verified previously reported loci but also identified a number of novel associations between drug responses or adverse reactions with genetic variants that may affect clinical practice and also provide novel insights into biologic mechanisms of drug-related pathways (Table 1; refs. 8–22).
Successful pharmacogenomics studies using GWAS
For cardiovascular diseases, simvastatin is a drug that is used to control elevated cholesterol levels; GWAS identified rs4363657 in SLCO1B1 (a solute carrier organic anion transporter that transports a number of drugs, including statins), to be significantly associated with simvastatin-induced myopathy (8). rs4363657 is in strong linkage disequilibrium with rs4149056 (V174A), a nonsynonymous SNP, that is related to statin pharmacokinetics (23). The Clinical Pharmacogenetics Implementation Consortium recommends use of the SNP rs4149056 to manage the risk of simvastatin-induced myopathy (24). GWAS also successfully validated variants on VKORC1, CYP2C9, and CYP4F2 as significantly influencing the maintenance dose of warfarin, an anticoagulant for reducing thromboembolic events (9, 10). The incorporation of genetic variants has provided an additional predictor for better warfarin dose assessments.
For the treatment of infectious disease, GWAS successfully identified HLA-B*57:01 as a major determinant of drug-induced liver injury caused by the administration of flucloxacilin, a drug used in the treatment of staphylococcal infection (13). Second, two GWAS studies from Japan and Australia uncovered genetic variants in IL28B that are associated with response to pegylated IFN-α and ribavirin therapy for patients with chronic hepatitis C (17, 18). Diagnostic tests based on host IL28B genotypes might help to identify nonresponders to pegylated IFN-α and ribavirin therapy who can opt for adjunctive or alternative therapies and thus avoid potential adverse drug reactions that might otherwise occur.
Two GWA studies from Japanese and European populations validated HLA-A*31:01 to be significantly associated with cutaneous adverse drug reactions, including hypersensitivity reaction, Steven–Johnson syndrome, and toxic epidermal necrolysis caused by carbamazepine, a drug that is frequently prescribed for the treatment of epilepsy, trigeminal neuralgia, and bipolar disease (Fig. 3; refs. 19 and 20).
Manhattan plot derived from GWAS of carbamazepine-induced cutaneous adverse drug reactions in Japanese population. In this plot, SNPs on HLA-A in chromosome 6 are significantly associated (P < 5.0E−08) with cutaneous adverse drug reactions. The associations of these SNPs were successfully validated in an independent sample set. Reprinted from Ozeki et al. (19) with permission.
It is important to collect uniform phenotypes to ensure the success of GWAS in pharmacogenomics studies. Some of the studies, although using relatively small sample sizes, showed significant associations of variants that have large effect sizes; for example, genotypes in the HLA region were observed. These variants may prove to be a useful diagnostic tests to predict individual's drug response.
Use of GWAS in Pharmacogenomics Studies for Individual Cancer Types
Cancer pharmacogenomics are well recognized for their complexity, as both germline genetic variants and the somatic mutations in the individual patient's tumor play important roles in determining response to anticancer agents. Activating mutations in the tyrosine kinase domain of EGF receptor (EGFR) are one of the prominent examples of somatic mutations that increase the sensitivity to gefitinib and erlotinib in non–small cell lung cancers (NSCLC; refs. 25 and 26). However, germline genetic variants identified through GWAS, which is the focus of this review, are variants that affect drug response, efficacy, and toxicity independent from tumor somatic mutation status or disease type (Table 2; refs. 27–39).
Cancer pharmacogenomics studies utilizing GWAS
Breast cancer
GWAS identified a promising locus on the T-cell leukemia 1A (TCL1A) gene to be associated with musculoskeletal adverse events (MS-AE) caused by the “third-generation” aromatase inhibitors that include anastrozole, exemestane, and letrozole, which are the effective endocrine therapies for estrogen receptor (ER)-positive early breast cancer (27). The associated variant, rs11849538 (P = 6.67E−07, OR = 2.21), closest to the 3′ end of TCL1A, was predicted to create an estrogen-response element and this prediction was subsequently confirmed with a chromatin immunoprecipitation assay using ERα-transfected lymphoblastoid cell lines (LCL) with the known genotype. In addition, LCLs expressing the variant SNP sequence showed significantly greater TCL1A expression compared with the wild-type sequence (27). Correlations between SNP-dependent differential TCL1A expression and various cytokine receptor genes, including interleukin (IL)-17, IL-17RA, IL-12, IL-12RB2, and IL-1R2 expression as well as NF-κB transcriptional activity were observed. These findings further suggested that the MS-AE occurrence is related to the inflammatory response (40). The findings of the GWAS implied an entirely new biologic hypothesis and improved the understanding of the pathophysiology of aromatase inhibitor therapy induced MS-AE.
Kiyotani and colleagues conducted a GWAS with 240 breast cancer patients who received tamoxifen monotherapy and identified a SNP, rs10509373 (log-rank P = 1.26E−10, multivariate HR = 4.53), in C10orf11 (unknown function) to be significantly associated with recurrence-free survival (RFS) after validation in two independent replication sample sets of 105 and 117 cases (28). Combined analysis revealed that there are cumulative effects of the associated genotypes from C10orf11, CYP2D6, and ABCC2 (previously identified to be associated with RFS in the same dataset; ref. 41), which indicated that the HRs for patients who carry three or more risk alleles increased the risk of recurrence from 6.51-fold (three risk alleles) to 119.51-fold (five risk alleles) compared with patients who carry only one risk allele (28).
Baldwin and colleagues performed a GWAS of paclitaxel chemotherapy-induced sensory neuropathy and identified a SNP in FGD4, rs10771973 (P = 2.6E−06, HR = 1.57) to be associated with the onset of sensory peripheral neuropathy; this finding was subsequently validated in a European and African-American replication cohort (29). FGD4 encodes a Rho-GTPase guanine nucleotide exchange factor previously reported to play a role in congenital peripheral neuropathies (42). In addition, they suggested that SNPs in EPHA5 (rs7349683) and FZD3 (rs10771973) could be associated with the onset or severity of the paclitaxel-induced sensory peripheral neuropathy based on marginal significance levels, biologic relevance, and estimated effect size (29).
Although alopecia is not a life-threatening event, it induces psychological stress, influences physical appearance, and often affects the patient's quality of life (43). Chung and colleagues reported a GWAS of monotherapy or combination chemotherapy-induced alopecia (hair loss) of patients with breast cancer and identified a SNP, rs3820706 (P = 1.85E−09, OR = 2.38) in CACNB4, a calcium channel voltage-dependent subunit β4, to be significantly associated with drug-induced alopecia (30). A potassium channel opener, minoxidil, was approved by the FDA for the treatment of alopecia (44), and the results of the current GWAS further suggest that ion channels might be involved in the pathogenesis of alopecia.
Lung cancer
Wu and colleagues reported a GWAS with advanced-stage NSCLC patients who received platinum-based chemotherapy and identified a SNP, rs1878022, in CMKLR1 (P = 5.13E−07, HR = 1.33) to be statistically significantly associated with poor overall survival (OS; ref. 31). The mean survival time (MST) for patients who possessed a non-risk homozygous genotype (16.05 months) was significantly longer than that for patients who carried the risk genotypes (10.72 months, P = 6.76E−05; ref. 31). CMKLR1 encodes for a 7 transmembrane G-protein–coupled receptor, which has been reported to be highly expressed in lung tissues (45). The binding of ligands, chimerin and resolving E1, to the receptor has been shown to activate various pathways such as mitogen-activated protein kinase (MAPK), extracellular signal-related kinase 1 and 2, and angiogenesis (46, 47).
Hu and colleagues performed three-stage GWAS studies in advanced-stage NSCLC patients receiving platinum-based chemotherapy in the Chinese population and identified five SNPs, rs7629386, rs969088, rs3850370, rs41997, and rs12000445 (meta-analysis P ranging from 3.63E−05 to 4.19E−07) that were suggestively associated with the survival of patients with NSCLC. Among these five SNPs, two SNPs, rs7629386 in CTNNB1 and rs3850370 in SNW1-ALKBH1-NRXN3, were further replicated in a Caucasian population (32). One of the strengths of this study is that the suggestive associated SNPs were not only validated in an independent sample from the Chinese population but also in another sample set from the Caucasian population. Assuming the underlying mechanism that caused the adverse drug event occurrence is the same, it is important to evaluate the associated genetic loci across different populations even though the genetic architecture (different allele frequency and linkage disequilibrium structure) might be a complex issue to resolve.
Sato and colleagues reported a GWAS of advanced NSCLC in which they found that three SNPs, rs1656402 in EIF4E2 (P = 8.4E−08, HR = 4.22, MST for AG+AA and GG were 18.0 and 7.7 months, respectively), rs1209950 in ETS2 (P = 2.8E−07, HR = 4.96, MST for CC and CT+TT were 17.7 and 7.4 months, respectively), and rs9981861 in DSCAM (P = 3.5E−06, HR = 16.1, MST for GG+AG and AA were 17.1 and 3.8 months, respectively), were suggestively associated with OS of patients with NSCLC after they received combined therapy of carboplatin and paclitaxel (33).
In addition to drug efficacy studies, Han and colleagues conducted a GWAS for irinotecan-induced diarrhea (grade 3) and neutropenia (grade 4) in patients with NSCLC and reported that rs1517114 in C8orf34, rs1661167 in FLJ41856, and rs2745761 in PLCB1 showed strong association with severe diarrhea, whereas rs11128347 in PDZRN3 and rs11979430 on chromosome 7 were suggestively associated with irinotecan-induced severe neutropenia (34). The authors also demonstrated that associated variants identified from candidate gene approaches, such as UGT1A1*6 and SLCO1B1 521T>C, were significantly associated with severe neutropenia and ABCC2 3972C>T was associated with diarrhea (34).
Pancreatic cancer (gemcitabine)
Gemcitabine is a deoxycytidine analogue that is used in treating patients with various solid tumors, including pancreatic cancer and NSCLC. A GWAS in patients with pancreatic cancer who received gemcitabine identified a nonsynonymous coding variant, rs763780 (H161R), in IL17F (P = 2.61E−08, HR = 3.3) to be associated with OS. Patients who were rs763780 heterozygous had reduced median OS (3.1 months) compared with patients without the variant (6.8 months; ref. 35). IL17F encodes for IL-17F, which is involved in angiogenesis and plays a role in the growth and metastatic spread of pancreatic cancer (48, 49).
Kiyotani and colleagues conducted a GWAS to identify gemcitabine monotherapy–induced severe leukopenia/neutropenia with patients recruited from the Biobank Japan and identified two possibly associated loci, rs11141915 in DAPK1 (P = 1.27E−06, OR = 4.10) and rs1901440 between NCKAP5 and MIR3679 (P = 3.11E−06, OR = 34.00; ref. 36). As an expansion of this report, a large-scale pharmacogenomics study involving patients recruited from Biobank Japan who received therapy with various chemotherapeutic agents identified six suggestive loci (P < 1.0E−05), including rs9961113 between BDP1P and SALL3, rs2547917 in PDE4D, rs12900463 in ALPK3, rs9609078 in OSBP2, rs6863418 near to HMP19 and rs6037430 near to NRSN2, to be associated with gemcitabine-induced grade 3 or 4 leukopenia/neutropenia (37).
Acute lymphoblastic leukemia
Methotrexate is an antimetabolite agent that is used to treat malignancies such as acute lymphoblastic leukemia (ALL) and autoimmune diseases. A GWAS with ALL patients from various ancestry groups (African, European, and Asian) identified multiple SNPs on the SLCO1B1 gene, encoding for the organic anion transporter (OATP1B1), to be significantly associated with methotrexate clearance and gastrointestinal toxicity (38). OATP1B1 was localized to the sinusoidal membrane of hepatocytes, and was shown to transport methotrexate in vitro. The association was further validated in another GWAS study with a larger cohort of patients with ALL (n = 1,279) treated with high-dose methotrexate (50). Variants in SLCO1B1 could be clinically useful in identifying patients who are at risk of low methotrexate clearance so that methotrexate clearance could be improved by possible interventions, such as increased intravenous hydration and/or alkalinization (50).
Lessons, Challenges, and Future Directions in Cancer Pharmacogenomics Studies Using GWAS
The identification of associated genetic variants through GWAS in cancer pharmacogenomics has been challenging and seemingly still requires great efforts to realize the goal of personalized medicine. The two greatest challenges commonly encountered while performing GWAS are (i) statistically underpowered case–control studies and (ii) stringent significant threshold because of multiple testing.
It is important to evaluate the statistical power, which correlates among sample size, effect size, causal allele frequency, and association of the causal variant to the study phenotype, beforehand. To identify an association of a modest effect SNP with an OR of 1.2 and causal allele frequency of 0.25, a sample set of approximately 5,000 individuals is necessary to identify significant association; on the other hand, in a high-effect SNP with an OR of 2.0 and causal allele frequency at 0.25, only 300 individuals are necessary (51). This estimation reflects the fact that to achieve sufficient statistical power in pharmacogenomics studies to identify truly associated variants with low/moderate effect size, a large number of patients who are treated with uniform therapy is required. This issue is particularly difficult to achieve in the study of cancer pharmacogenomics because of varied drug combinations, different dosing regimens, and treatment durations, as well as dosage adjustment based on the patient's condition that often contribute to the heterogeneity in treatment effects. Rapid advances in the development of molecular-targeted drugs coupled with changing treatment paradigms add to the complexity of pharmacogenomics analysis. Furthermore, the incidence of adverse drug events is expectedly low, which subsequently contributes to the difficulties in collecting an adequate number of cases given a specific therapy. Hence, to overcome this issue, collecting DNA samples with identical treatment protocols and robust phenotype/clinical data from large, collaborative clinical trials, both local and international, could further advance the field of cancer pharmacogenomics. One of the ways this could happen is through the establishment of an international alliance, such as the Global Alliance for Pharmacogenomics (52) to facilitate international collaboration by bringing together resources to advance the understanding of drug responses.
Because GWAS studies are carried out by testing approximately a million SNPs simultaneously, and multiple testing is applied to avoid the identification of false-positive associated SNPs that are due to chance. Currently, the genome-wide significance threshold is set at P < 5.0 × 10−8 based on Bonferroni correction, which divides 0.05 by a million independent SNPs. As most of the identified variants from the GWAS of cancer pharmacogenomics did not surpass this stringent and conservative threshold, verification with an independent set of samples would be useful to support the findings from GWAS. One reasonable way to compromise the GWAS stringent significant threshold is the use of pharmacogene panels (this topic is addressed extensively in this CCR Focus section), which screened SNPs in genes with known relevance to drug pharmacokinetics and pharmacodynamics (53). To provide additional supportive evidence of variants identified from GWAS, functional analyses, such as electrophoretic mobility shift assay and luciferase reporter assay, could be carried out to investigate the effects of the variants affecting the phenotypes. HapMap lymphoblastoid cell lines (LCL), which are known to have ample genetic information, could be utilized to identify or validate variants that correlate with cytotoxicity after specific drug exposure (Fig. 4; ref. 54). This method could enhance our understanding of genotype–phenotype correlation in pharmacogenomics studies.
Cell-based genome-wide approach by using EBV-transformed B lymphoblastoid cell lines (LCL) derived from blood donor of the HapMap/1000 Genomes project. One of the greatest advantages of using the cell-based model is the readily available SNP information from the HapMap and/or 1000 Genomes database. It is now feasible to treat the cell lines with the drug of interest to evaluate the association of genotype and drug response and genotype and gene expression as well as gene expression and drug response. The identification of candidate genes or SNPs that are associated with drug response in the cell model could be validated in an independent study by using a clinical sample.
Markers that have been identified through GWAS of pharmacogenomics studies should be validated in an independent sample set before being incorporated into a prediction algorithm. Subsequently, the validity of the prediction algorithm could be evaluated in a prospective study with patients who received the same treatment protocol. An automated SNP genotyping system could be provided to speed up the process of SNP genotyping before treatment. Application of GWAS could be useful in the drug discovery phase in the future by incorporating SNP genotypes that could distinguish/predict individuals who might develop adverse drug reactions beforehand.
Conclusions
Although the use of GWAS to identify susceptible loci with complex diseases has reached a bottleneck in the genomic era, GWAS of cancer pharmacogenomics could still contribute in identifying novel loci because of the constant development of new drugs or new protocols from clinical trials. In fact, the applications of GWAS in cancer pharmacogenomics have identified a handful of candidate genetic loci associated with drug-induced toxicities and efficacies as well as suggested novel biologic pathways to further improve our understanding of the underlying mechanisms. Nevertheless, most of these studies are underpowered and require additional validation in multiple independent sample sets or functional analyses to further elucidate the gene–phenotype relationship. To ensure the quality and success of pharmacogenomics studies, local and international collaborative efforts are essential to collection of sufficient and homogeneous sample sets. It is hoped that genetic markers identified from pharmacogenomics studies could be used in clinical practice in the near future to predict drug efficacy and toxicity, and thus represent the promise of personalized medicine to improve patients' quality of life.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: S.-K. Low, A. Takahashi, M. Kubo
Development of methodology: S.-K. Low, A. Takahashi
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.-K. Low, A. Takahashi, M. Kubo
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S.-K. Low, A. Takahashi
Writing, review, and or revision of the manuscript: S.-K. Low, A. Takahashi, T. Mushiroda, M. Kubo
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S.-K. Low, A. Takahashi, M. Kubo
Study supervision: A. Takahashi, M. Kubo
- Received December 19, 2013.
- Revision received March 14, 2014.
- Accepted March 27, 2014.
- ©2014 American Association for Cancer Research.