Abstract
Purpose: Pancreatic ductal adenocarcinoma (PDAC) lethality is multifactorial; although studies have identified transcriptional and genetic subsets of tumors with different prognostic significance, there is limited understanding of features associated with the minority of patients who have durable remission after surgical resection. In this study, we performed laser capture microdissection (LCM) of PDAC samples to define their cancer- and stroma-specific molecular subtypes and identify a prognostic gene expression signature for short-term and long-term survival.
Experimental Design: LCM and RNA sequencing (RNA-seq) analysis of cancer and adjacent stroma of 19 treatment-naïve PDAC tumors was performed. Gene expression signatures were tested for their robustness in a large independent validation set. An RNA-ISH assay with pooled probes for genes associated with disease-free survival (DFS) was developed to probe 111 PDAC tumor samples.
Results: Gene expression profiling identified four subtypes of cancer cells (C1–C4) and three subtypes of cancer-adjacent stroma (S1–S3). These stroma-specific subtypes were associated with DFS (P = 5.55E-07), with S1 associated with better prognoses when paired with C1 and C2. Thirteen genes were found to be predominantly expressed in cancer cells and corresponded with DFS in a validation using existing RNA-seq datasets. A second validation on an independent cohort of patients using RNA-ISH probes to six of these prognostic genes demonstrated significant association with overall survival (median 17 vs. 25 months; P < 0.02).
Conclusions: Our results identified specific signatures from the epithelial and the stroma components of PDAC, which add clarity to the nature of PDAC molecular subtypes and may help predict survival.
Translational Relevance
In the era of precision medicine, fine tumor characterization is required for improved personalized treatment selection. Our results identified specific signatures from the epithelial and stromal components of pancreatic ductal adenocarcinoma (PDAC). These signatures defined subtypes of each compartment associated with different disease-free survival (DFS). Our findings help elucidate the complex nature of PDAC molecular subtypes and expand our understanding of global transcriptional programs in the stroma. Our identification of 13 genes that were predominantly expressed in cancer cells and corresponded with DFS warrants the testing of this signature in the current prospective adjuvant and neoadjuvant chemotherapy trials; if validated, our signature could help select patients with resectable disease for either immediate surgery (for the predicted long-term survivor subgroup) or neoadjuvant chemotherapy (for the predicted short-term survivor subgroup), which ultimately can affect outcome and impact quality of life.
Introduction
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal human cancers and its incidence is rising (1, 2). Systemic chemotherapy and radiotherapy provide limited improvement in patient survival, and the only curative treatment of PDAC remains surgical resection of the tumor, a treatment for which less than 20% of patients are eligible. Despite our understanding of PDAC-associated genetic mutations, including KRAS, TP53, SMAD4 and CDKN2A, there has been limited clinical application of these findings as predictive biomarkers of therapy (3, 4).
For other solid tumors, studies have shown that the heterogeneity in patient response to treatment is determined, in part, by molecular differences between tumors. To investigate the intertumor heterogeneity of PDAC, several studies have identified transcriptional subtypes of PDAC that are associated with different molecular pathways. Collisson and colleagues reported three different subtypes: classical, quasi-mesenchymal, and exocrine-like (5). Bailey and colleagues defined four separate subtypes: squamous, pancreatic progenitor, immunogenic, and aberrantly differentiated endocrine exocrine (ADEX; ref. 6). In the first large-scale effort to account for the abundant tumor stroma in PDAC, Moffitt and colleagues applied virtual microdissection to identify the gene expression signatures of cancer cells versus the stromal component of the tumor. These studies resulted in the identification of four different subtypes: two tumor-specific subtypes, “classical” and “basal-like,” with the latter showing worse overall survival (OS), and two stroma subtypes, “normal” and “activated,” the latter showing worse OS (7). The squamous, quasi-mesenchymal, and basal-like subtypes overlapped considerably, as did the classical and the progenitor subtypes. However, some have questioned the validity of the immunogenic, exocrine, and ADEX subtypes, suggesting they may arise from gene expression signatures of normal pancreatic tissues and stromal cells found in bulk tumor samples (8).
PDAC is marked by a dense stromal component, which comprises approximately 60%–90% of the tumor volume (9). Numerous studies suggest that the stroma in PDAC plays a role in regulating the initiation of tumorigenesis and the progression to metastatic disease (10–13). In this study, we performed laser capture microdissection (LCM) and RNA sequencing (RNA-seq) analysis of treatment-naïve PDAC samples to define their cancer- and stroma-specific molecular subtypes. Capitalizing on the divergent clinical outcomes of these patients, we identified a molecular subtype-independent gene expression signature that is predictive of postoperative survival. Furthermore, we developed an RNA-ISH assay to demonstrate the clinical utility of several genes whose dysregulation is implicated in survival.
Materials and Methods
Data collection
Snap-frozen primary tumors from patients treated for PDAC at the Massachusetts General Hospital (Boston, MA) from 1995 to 2014 were used for this study. Each patient gave written, informed consent and the studies were conducted in accordance with the U.S. Common rule. Patients' data were entered prospectively into clinical databases approved by the Massachusetts General Hospital (Boston, MA) Institutional Review Board. Patients with unresectable pancreatic tumors, patients who received palliative surgery, and patients who received neoadjuvant chemotherapy were excluded. Only patients with resectable tumors who received neither neoadjuvant chemotherapy nor radiotherapy were included.
LCM
Tissue preparation details and quality assessment are described in Supplementary Martials and Methods. Cancer cells and adjacent stromal cells were microdissected from 28 PDAC samples embedded in optimal cutting temperature (OCT) medium using an Arcturus Pixcell IIE LCM platform (Applied Biosystems). Cancer and stromal cells were collected separately on CapSureMacro LCM Caps (Applied Biosystems). Approximately five caps of tumor cells and five caps of stromal cells were collected per sample. The collection caps were incubated at 42°C for 30 minutes in 25 μL of guanidinium thiocyanate–containing extraction buffer (PicoPure RNA Isolation Kit, Applied Biosystems), centrifuged briefly, and the extracted solution was frozen and stored at −80°C. A digital image was taken of each dissected area and reviewed by a pathologist with expertise in pancreatic disease before RNA isolation to ensure capture accuracy. A total of 796 images marking the areas of capture from each LCM sample. A total of 48 images were found to have areas of questionable purity and caps containing tissue of questionable purity were discarded and not used for downstream RNA isolation. An average area of 232,719 μm2 of cancer and 268,588 μm2 of stroma were employed for RNA isolation and RNA-seq.
RNA isolation and RNA-seq
Samples in extraction buffer were thawed and pooled prior to purification on picopure RNA columns (PicoPure RNA Isolation Kit, Thermo Fisher Scientific), while genomic DNA was removed via RNase-free DNase digestion (Qiagen) on the columns. Total cellular RNA from each column was eluted in 11 μL of elution buffer. Amplified cDNA was then generated by using the SMARTer Ultra-Low-input RNA Kit, version 4 (Takara Bio USA) followed by Nextera XT DNA Library Preparation kit (Illumina) for sample barcoding and fragmentation as per the manufacturer's protocols. The sequencing results have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE164665. Additional details can be found in Supplementary Materials and Methods.
Gene expression data analysis
Prior to analysis, the log2-transformed RNA-seq data was filtered out to exclude poorly expressed genes (defined as a median expression of 0 across all samples) and poorly variant genes (defined as a SD below 0.5). Data filtering resulted in 12,828 and 13,108 genes for cancer and stromal cells, respectively. Unsupervised analysis to identify robust sample clusters on RNA-seq data was performed with non-negative matrix factorization (NMF) using the R package “ConsensusClusterPlus” with parameters defined as a maximum of six clusters, 1,000 iterations, using 60% of samples and the hierarchical cluster algorithm with the hclust R function using Pearson distance and average linkage (14). The number of optimal clusters was chosen to have the higher cophenetic correlation coefficient among the number of evaluated clusters, resulting in four robust clusters of gene expression for both cancer and stromal cells.
Supervised analyses to identify gene expression signatures comparing cancer with stromal cells and comparing survival within both cancer and stromal LCM subgroups were done using a moderated t test with empirical Bayes statistic included in the limma R package. The FDR was assessed to correct the multiple testing hypothesis using q-value with the q-value R package (15).
The methods used to analyze the ontology of the underlying gene expression signatures are based on gene ontology (GO) biological processes. These methods, specifically, Database for Annotation, Visualization, and Integrated Discovery (DAVID; http://david.abcc.ncifcrf.gov/) and gene set variation analysis (GSVA), were used to assess the relative enrichment of gene sets at the sample level via an unsupervised, nonparametric approach (16). GSVA was done using the GSVA R package associated with the Molecular Signatures Database (MSigDB) gene sets v7.0 (17).
Gene expression data analysis of public datasets
To assess the gene expression signatures identified in this study, we gathered clinicopathologic and high-throughput gene expression data from clinical pancreatic carcinoma samples found in 15 previously reported datasets made publicly available through the following international database repositories: National Center for Biotechnology Information Genbank Gene Expression Omnibus, ArrayExpress, European Genome-phenome Archive, and The Cancer Genome Atlas (Supplementary Table S1). The Ontario Institute for Cancer Research, funded by the Government of Ontario, is acknowledged as the source of data used in this study. Detailed information concerning Institutional Review Board and Ethical Committee approval and patients' consent for all 15 studies was provided in the corresponding publications. The complete dataset contained 1,288 samples profiled using whole-genome DNA microarrays (Affymetrix or Agilent) and RNA-seq (Illumina), including 963 surgically resected primary cancer samples. Each dataset was normalized as previously described previously (18). In addition, the datasets were normalized separately, using robust multichip average (RMA) for Affymetrix microarray raw data and quantile normalization for the available processed data (19). Normalization was done in R using the Bioconductor associated package. When multiple probes mapped to the same GeneID, the one with the highest variance in each dataset was retained. Finally, the previously normalized RNA-seq data were log2 transformed.
We defined the molecular subtypes of all pancreatic samples in each dataset separately as defined in the original publications: the three Collisson and colleagues subtypes were classical, quasi-mesenchymal, and exocrine-like, the two Moffitt and colleagues epithelial subtypes were basal-like and classical, and the four Bailey and colleagues subtypes were squamous, pancreatic progenitor, immunogenic, and ADEX (5–7). The gene expression signature of this study was then applied to each public dataset separately using the nearest centroid method with the Pearson correlation representing distance. On the basis of the genes of each signature and within the associated dataset, centroids were computed as the mean expression value of each group of interest and samples of each dataset were classified to the closest centroid.
Multiple component analysis (MCA) was used to construct a low-dimensional visual representation of associations between the LCM-derived cancer groups with consensus molecular subtypes. MCA computes relationships within a series of pairwise comparisons of categorical variables with decomposition in singular values as a generalization of the principal component analysis. The top two components with the highest percentages of inertia were chosen to define the factorial plan to use. Each modality from all classifiers was represented as a projection on the two-dimensional plan, where length and angle between modalities defined strength and level of correlation. Thus, an acute, orthogonal, or rotational symmetric angle defined, respectively, a correlation, noncorrelation, and anticorrelation. MCA were computed and drawn in R/cran with the ade4 package.
RNA-ISH
RNA-ISH was performed with RNAscope 2.5 Duplex technology using the manual platform (catalog no. 322430) per the manufacturer's instructions. Briefly, FFPE tissue sections of PDAC human tissue microarrays (TMA) were cut in 5 μm sections and mounted on Surgipath X-tra glass slides (Leica Biosystems). To allow unmasking and RNA probe accessibility, deparaffinization was performed by baking the slides for 1 hour at 60°C followed by pretreatment with Histoclear and 100% ethanol. The samples were then exposed to hydrogen peroxide for 10 minutes at room temperature followed by target retrieval solution for 15 minutes at 99°C and, finally, Protease enzyme for 30 minutes at 40°C to allow probe accessibility.
Probe cocktails of AP5M1, PNP, TCP1, ADGRF1, MIA, and MUC16 were prepared using the guidelines provided in the RNAscope 2.5 Duplex technology user manual. Probe cocktails were incubated with the slides at 40°C for about 2–3 hours to allow target-specific probes to hybridize to the target mRNA. This was proceeded by a series of signal amplification steps; Amp 1 was added to bind to the target-specific probe, followed by Amp 2 through Amp 10 which were subsequently added step by step to bind each other. Next, type-specific label probes conjugated to alkaline phosphates were added to bind to the Amps, thus completing the branched DNA tree and providing signal amplification due to this cascade of hybridizations. The signal was visualized by sequential addition of Red substrate which binds to Amp 6 and Green substrate which binds to Amp 10 producing Red and Green precipitates (dots). Samples were counterstained with hematoxylin, which turned the cell nuclei and green precipitate. The target mRNAs were then visualized using a standard brightfield microscope which showed the ADGRF1, MIA, and MUC16 probe cocktail signal as Red and the AP5M1, PNP, and TCP1 probe cocktail signal as blue.
Statistical analysis
Correlations between tumor and stroma subtype and clinicopathologic features were analyzed using the t test or the Fisher exact test when appropriate. OS was calculated from the date of surgery to the date of death resulting from pancreatic cancer. The diagnosis of disease progression was defined as the earliest radiologic evidence of new neoplasm recurrence after complete resection or new neoplasm growth after incomplete resection. Follow-up was measured from the date of surgery to the date of last communication with living patients. Survivals were calculated using the Kaplan–Meier method and curves were compared with the log-rank test. Univariate and multivariate survival analyses were performed using Cox regression analysis (Wald test). Variables tested in univariate analyses included patients' age at time of surgery (continuous value), sex, and pathologic features including pathological type, tumor size, regional lymph node status (positive vs. negative), tumor grade, and tumor- and stroma-specific GES (short-term vs. long-term survival). Variables with a P <0.10 were tested in multivariate analysis. All statistical tests were two sided at the 5% level of significance. Statistical analysis was performed using the survival package (version 2.30) in the R software (version 2.15.2; http://www.cran.r-project.org/).
Results
Identification of tumor- and stroma-specific subtypes in patients with PDAC
A total of 69 treatment-naïve PDAC tumors were chosen for RNA isolation and histologic analysis. The RNA quality and cancer content from each tumor sample was carefully assessed, yielding 28 samples suitable for LCM (Fig. 1A). Cancer cells and adjacent (<280 μm) stromal cells were collected separately (Fig. 1B). Prior to RNA isolation, images of the cancers and adjacent stroma captured by LCM underwent pathologic review and samples of questionable purity were discarded. Ultimately, RNA from 19 cancer samples and their matched adjacent stroma samples were successfully sequenced (Supplementary Table S2).
Gene expression differences between cancer cells and adjacent stroma in PDAC tumors. A, Flowchart of sample selection for LCM and RNA-seq analysis. B, Representative images of tumor samples before capture (top), selected for capture (middle), and after capture by LCM (bottom). C, Hierarchical clustering based on mRNA expression levels of 38 LCM samples centered on each respective pair of the 19 cancer and 19 stroma samples with 9,574 genes presenting an SD above 1.5. Each row represents a gene, and each column represents a sample. Heatmap of the median centered data is color coded from the lowest value (green) to the highest (red). Colored bar below the dendrogram shows the cancer cells in red and stromal cells in orange. A solid orange vertical line defines the main separation of the cluster. D, Volcano plot showing the 1,226 genes differentially expressed between the stromal and cancer cells. E, Heatmap of the 1,226 genes differentially expressed between the stromal and cancer cells, where genes are sorted by statistic of comparison and samples are sorted according their correlation to the stroma centroid. Colored bar above heatmap shows the cancer cells in red and stromal cells in orange. A solid orange vertical line defines separation between stroma and cancer predicted groups.
Unsupervised analysis of these 38 total samples using paired data spontaneously classified the cancer and stroma samples into two separated branches (Fig. 1C). Hierarchical clustering further highlighted two clear-cut clusters, prompting us to perform a supervised analysis to identify a discriminating signature between cancer and stroma. A moderated t test between the two groups identified a 1,226-gene signature (P < 1%, q < 1%, and FC > 2×) that predicted the cancer or stroma compartments with 100% accuracy (Fig. 1D and E; Supplementary Table S3). Ontology analysis revealed the cancer compartments of these tumors to be enriched for genes involved in epithelial cell morphogenesis, cholesterol and triglyceride biosynthesis, and steroid and retinoid metabolism, while the stromal compartments were enriched for genes regulating the extracellular matrix (ECM), immune response, and cell adhesion (Supplementary Table S4).
Consensus clustering identified four subtypes of cancer cells (C1–C4) that were specified by a total of 209 genes (Fig. 2A; Supplementary Table S5). To identify the biological processes associated with the cancer cell clusters, GSVA was performed with annotated gene expression sets from MSigDB (Fig. 2B; Supplementary Table S6). Investigation of canonical pathways and gene ontology biological process gene sets revealed the C1 subtype to be associated with genes involved in protein folding and leukocyte chemotaxis, C2 with gene programs required for pancreatic endocrine cell development and function as well as neuronal membrane signaling, C3 with nucleotide biosynthesis and regulation of protein translation, and C4 with gene programs enriched in oncogenic signal transduction pathways.
Transcriptional subtypes of PDAC cells identified by RNA-seq. A, Cophenetic correlation coefficient measuring the stability of the clusters obtained from NMF in cancer cell data with the chosen number of groups indicated by orange lines (top left). Heatmap of the coclassification matrix resulting from the consensus clustering approach corresponding to the chosen number of groups (top right). Heatmap of the 209 genes specific to the four NMF groups of the cancer cells where genes are sorted by statistic of groups comparison and samples are sorted according to their correlation to the centroid of their corresponding groups (bottom). Each row represents a gene, and each column represents a sample. Colored bars above and to the left of the heatmap indicate the attribution of cancer cell groups. B, Heatmap of GSVA relative enrichment score of the top 15 gene sets associated with each cancer cell group. Each row represents a gene, and each column represents a sample. Colored bars above and to the left of the heatmap indicate the attribution of cancer cell groups. C, Graph of MCA of the four cancer NMF groups with the four molecular subtype classifiers in public datasets. D, Kaplan–Meier DFS curves corresponding to the four NMF cancer groups identified in 573 PDAC samples from public datasets.
These classifications were then compared with the three major molecular classifications of PDAC (Bailey, Collisson, and Moffitt) according to several criteria. Regarding the gene composition, cross-referencing the four gene lists (Bailey: 859 genes; Collisson: 62 genes; Moffitt “tumor”: 50 genes; our cancer-specific subtypes: 209 genes) revealed that there was limited overlap between the gene lists from our LCM cancer-specific subtypes and the other molecular classifications (Supplementary Fig. S1). However, a MCA performed with publicly available gene expression sets from 963 primary PDAC tumors identified a strong correlation between our LCM-purified cancer subtypes and the molecular subtypes previously defined using bulk tumors containing a mixture of cancer and stromal cells. Although divergent from each other, C1 and C3 subtypes were found to be associated with the classical/pancreatic progenitor subtype, C2 was associated with the ADEX/exocrine-like subtypes, and C4 was associated with the squamous/basal-like/quasi-mesenchymal subtype (Fig. 2C). Patients with the C4 cancer-specific subtype had the shortest disease-free survival (DFS; Fig. 2D), which is consistent with its association with squamous/basal-like/quasi-mesenchymal subtypes (Supplementary Fig. S2A–S2C).
A similar examination of the gene expression data of the adjacent stroma by consensus clustering was performed. While the consensus matrix suggested four subtypes of stroma, one of these was defined by a single sample (Fig. 3A, top) and further analysis was therefore based on clustering yielding three stromal subtypes (S1–S3) that were specified by 274 genes (Fig. 3A; Supplementary Table S7). Like the cancer subtype analysis, GSVA was performed on the subtypes of adjacent stroma with a focus on curated and GO gene sets (Supplementary Table S8). These analyses revealed the S1 subtype to be associated with development and cell differentiation, S2 with antigen processing and presentation, S3 with phospholipid synthesis and modification of macromolecules (Fig. 3B).
Transcriptional subtypes of stromal cells identified by RNA-seq. A, Cophenetic correlation coefficient measuring the stability of the clusters obtained from NMF in stromal cell data with the chosen number of groups indicated by orange lines (top left). Heatmap of the coclassification matrix resulting from the consensus clustering approach corresponding to the chosen number of groups (top right). Heatmap of the 274 specific genes to the three NMF groups of the stromal cells where genes are sorted by statistic of groups comparison and samples are sorted according to their correlation to the centroid of their corresponding groups (bottom). Each row represents a gene, and each column represents a sample. Colored bars above and to the left of the heatmap indicate the attribution of cancer cell groups. B, Heatmap of GSVA relative enrichment score of the top 20 gene sets associated with each stromal group. Each row represents a gene, and each column represents a sample. Colored bars above and to the left of the heatmap indicate the attribution of stromal groups. C, Stack bar plot showing LCM stroma group proportions across previously defined CAF subtypes (20). The significance status of the given proportion test is indicated above each bar (*, P < 0.05; N.S, P > 0.05). D, Stack bar plot showing Moffitt stroma groups proportion across the three NMF stroma groups within 677 PDAC from public datasets informed on Moffitt molecular subtypes (normal and activated stroma in orange and red, respectively). Above each bar is the significance status of the given proportion test (***, P < 1E-03; N.S, P > 0.05). E, Kaplan–Meier DFS curves corresponding to the three NMF stromal groups in 573 PDAC from public datasets.
Cancer-associated fibroblasts (CAF) are the major cell types of the tumor stroma and recent studies have defined transcriptionally unique subsets of these cells. Examining the gene expression signatures reported by Elyada and colleagues revealed a significant correlation between antigen-presenting CAFs and S3 subtype (P = 1.71E-02) (20) A comparison of the LCM stromal subtypes with those defined by Moffitt and colleagues revealed the normal and activated stroma subtypes to be enriched for our S1 (P = 4.04E-03) and S2 (P = 1.05E-12) subtypes, respectively (Fig. 3C). There was no correlation between our stromal subtypes and the four CAF subtypes recently described by Neuzillet and colleagues (Supplementary Fig. S3; ref. 21).
Our stroma-specific subtypes were also associated with DFS (P = 5.55E-07), with S1 associated with better prognosis (Fig. 3D). A more detailed DFS analysis examining the combined impact of various cancer and stoma subtypes in an independent validation set of 573 patients revealed that S1 stromal signatures had the longest DFS when paired with C1 (P = 1.35E-02) and C2 (P = 6.80E-05) cancer signatures (Supplementary Fig. S4). No difference in DFS was found between the other pairs of cancer- and stroma-specific subtypes.
Given the prospects for eventual clinical application, we sought to reduce our list of genes used to define cancer and stroma subtypes. Genes that were similarly expressed between cancer and stroma, as well as those with similar expression in normal pancreas were excluded from our lists of 209 cancer genes and 274 stromal genes that defined the subtypes. This resulted in a gene list of 37 cancer-specific and 66 stroma-specific genes (Supplementary Fig. S5A and S5B; Supplementary Table S9). Twelve genes were associated with C1, five with C2, five with C3, and 15 with C4. Within the stroma compartment, 27 genes were associated with S1, 38 with S2, and one with S3. These optimized signatures separated 573 patients into the different cancer and stromal subtypes that were associated with DFS (cancer P = 1.9E-04 and stroma P = 5.55E-07), with C4 exhibiting the shortest DFS and S1 the longest (Supplementary Fig. S5C and S5D).
Identification of a subtype-independent gene expression signature associated with survival
We searched for a prognostic gene expression signature associated with DFS for the cancer and stroma compartments. A set of 13 samples was selected, representing two groups of patients with divergent DFS: <12 months short term survivors (STS; n = 5) and >24 months long term survivors (LTS; n = 8). There were no differences in the clinicopathologic parameters and molecular subtypes between these two groups (Supplementary Table S10). In the cancer compartment, supervised analysis of the RNA-seq data identified 111 genes differentially expressed between the STS and LTS groups (Fig. 4A; Supplementary Table S11). Consistent with the notion that at least a subset of these genes contributes to the biological differences between these tumors, genes previously associated with poor survival in PDAC (KLK6, FGF1, MUC16, WIPF1) and those promoting the aggressive behavior of PDAC cells (GLI2, MIA, PLEKHG2) were associated with shorter DFS, and genes that inhibit the migration of PDAC cells (CBX7 and TP53INP1) were among those associated with longer DFS. To better understand the differences in gene programs between STS and LTS groups, an analysis of the associated GO biological processes was performed (Fig. 4B; Supplementary Table S12). Strongly activated pathways in the STS group were related to cell plasticity (COMP, MUC16, PLXNC1, FGF1, IRS1, SEMA6C, HOXB5), axon guidance (AP2A1, FGF1, GLI2, IRS1, PLXNC1, SEMA6C), cell proliferation (FGF1, IRS1, MZB1, FGF1, HSPA1A, TLR2, ALOX15B, PTGES), and signal transduction (ARHGAP23, ARHGAP42, AP2A1, CRABP2, FGF1, IRS1, KRT17, PDE10A, PTGES, SPOCK2, TLR2, ZNF217, PLEKHG2, WIPF1). In contrast, cancer cells from patients with LTS were enriched in pathways related to cell-cycle regulation (CDK4, CDKN2C, KNTC1, NUP88, TP53INP1), as well as tRNA and mRNA processing (RBM23, SCAF8, NUP88, TRMT11, TRMT5).
Identification of a gene expression signature in PDAC cells associated with survival. A, Heatmap of the 111 genes differentially expressed between the STS and LTS in cancer cells. Genes are sorted by statistic of comparison, and samples are sorted according their correlation to the STS centroid (below plot). Colored bar above the heatmap show the survival groups (STS in dark red and LTS in green). A solid orange vertical line defines separation between STS and LTS predicted groups. B, GO biological processes of the DAVID database associated with the 111-gene list. The bar plot indicates the −log(P) (x-axis) of the 38 biological pathways (y-axis) that are enriched for genes overexpressed in the STS samples (in red) or the LTS samples (in green). The P value threshold is indicated by the orange vertical line. C, Kaplan–Meier DFS curves according to survival groups (STS and LTS classes) defined by the 111 genes in 573 PDAC from public datasets.
In univariate analysis, only one clinicopathologic variable was associated with DFS (Wald test): the American Joint Committee on Cancer (AJCC) clinical stage (P = 6.24E-03). Our 111-cancer gene prognostic classifier (P = 3.90E-03), was also associated with DFS. In multivariate analysis, only the 111-gene classifier (P = 9.02E-03) and the AJCC clinical stage (P = 1.86E-03) remained significant, confirming independent prognostic value. In the stroma compartment, a similar approach identified 109 genes to be differentially expressed between STS and LTS of LCM captured samples. However, this stroma signature was not validated in the independent set of 573 patients (Supplementary Fig. S6), suggesting that the adjacent stroma is not the sole contributor to the extremes in clinical behavior of PDAC.
The robustness of this 111-cancer gene prognostic classifier was examined in an independent validation set of 573 patients. The classifier sorted the patients into two subgroups, STS (n = 323; 56%) and LTS (n = 250; 44%), with 2-year DFS of 30% (95% CI: 25–36) and 42% (95% CI: 36–50), respectively (P = 3.86E-03, log-rank test; Fig. 4C). We compared the predictive value of our 111-cancer gene prognostic classifier with that of other molecular classifiers and clinicopathologic variables (Table 1). In univariate analysis, only one clinicopathologic variable was associated with DFS (Wald test): the AJCC clinical stage (P = 6.24E-03). In multivariate analysis, only the 111-gene classifier (P = 9.02E-03) and the AJCC clinical stage (P = 1.86E-03) remained significant, confirming independent prognostic value. In the stroma compartment, a similar approach identified 110 genes to be differentially expressed between STS and LTS of LCM captured samples. However, this stroma signature was not validated in the independent set of 573 patients (Supplementary Fig. S5), suggesting that the adjacent stroma is not the sole contributor to the extremes in clinical behavior of PDAC.
Association of histoclinical and molecular variables with DFS.
Identification and validation of a 13-gene prognostic expression signature
To validate our gene classifier across an independent cohort of PDAC specimens, we built a multigene classifier from the 111-gene list to refine genes that were most informative for long- or short-term survival. Of the 111 genes, 13 genes were determined to be specifically expressed in cancer cells (Fig. 5A; Supplementary Table S13), including eight upregulated (MIA, MUC16, LYNX1, PSCA, ADGRF1, KLK6, KRT17, and MFSD9) and five downregulated (AP5M1, TCP1, PNP, ANKRD39, and CDK4) genes in the STS samples. We tested this 13-gene prognostic classifier in an independent validation set of 573 patients. Similar to the 111-gene classifier, the 13-gene classifier sorted the 573 patients into two subgroups: STS (N = 314; 55%) and LTS (N = 259; 45%), with 2-year DFS of 29% (95% CI: 24–35) and 43% (95% CI: 36–50), respectively (P = 5.62E-03, log-rank test; Fig. 5B), thus confirming its prognostic value. We searched for correlations between the 13-gene signature-based classification and the clinicopathologic variables of samples. No correlation was found with the age and sex of the patients, AJCC stage, pathologic type, tumor size, and lymph node status (Supplementary Table S14). In contrast, we did find more aggressive molecular subtypes (Bailey squamous and Moffitt basal-like; P < 0.05) in the STS subgroup.
A six-gene RNA-ISH assay is an independent prognostic factor of survival. A, Heatmap of the 13 genes differentially expressed between the STS and LTS in cancer cells. Genes are sorted by statistic of comparison, and samples are sorted according their correlation to the STS centroid (below plot). Colored bar above the heatmap shows the survival groups (STS in dark red and LTS in green). A solid orange vertical line defines separation between stroma and cancer predicted groups. B, Kaplan–Meier DFS curves according to our survival groups (STS and LTS classes) defined by the 13 genes in 573 PDAC from public datasets. C, Representative images of RNA-ISH using pooled probes of six genes in the LTS (blue) and STS (red) signatures. Areas defined by boxes in the top panels are shown at higher magnification in the bottom panels. Scale bars for top and bottom panels are 100 μm and 25 μm, respectively. D, Kaplan–Meier OS curves resulting from scoring of the RNA-ISH of a TMA containing 111 PDAC samples. E, Univariate and multivariate prognostic analyses for OS in the 111 PDAC of the six-gene RNA-ISH data model in TMA. Bold type indicates statistical significance (P < 0.05).
Next we compared the prognostic value of our 13-gene classifier with that of the other clinicopathologic variables in the validation set. In univariate analysis (Supplementary Table S15), only AJCC clinical stage (P = 6.24E-03) and our 13-gene cancer compartment classifier (P = 5.73E-03) were associated with DFS (Wald test). In multivariate analysis, only the classifier (P = 1.34E-02) and the AJCC clinical stage (P = 1.64E-03) remained significant, suggesting independent prognostic value.
To further examine the utility of these genes in predicting the clinical behavior of patient tumors, we developed a two-color RNA-ISH assay with pooled probes for three genes (AP5M1, TCP1, and PNP) associated with LTS (blue) and three genes (MIA, MUC16, and ADGRF1) associated with STS (red). Genes within each pool were expressed at similar levels in LCM samples, thereby minimizing the potential of a single gene driving the expression signal of a pool (Supplementary Table S11). This assay was used to probe a TMA consisting of samples of surgically resected PDAC tumors (Fig. 5C). A total of 111 tumor cores were scored in a blinded fashion to determine the expression of these genes specifically in cancer cells through visual histologic confirmation. Consistent with our LCM analysis, limited expression of these genes was observed in the stroma of these tumor samples. These analyses identified 51 tumors with cancer glands that stained predominantly with genes associated with LTS, and 60 tumors that stained with genes associated with STS. Patients with tumors that stained with STS probes exhibited a shorter OS than those staining with LTS probes (median survival 17 vs. 25 months, P < 0.02; Fig. 5D). Upon multivariate analysis, only resection margin status and RNA-ISH scoring were significantly associated with OS (Fig. 5E).
Discussion
The prognosis of pancreatic cancer, a heterogeneous disease with high metastatic propensity, has not improved despite considerable effort. A major challenge that needs to be addressed is improving the predictive ability of current prognostic factors to aid in therapeutic decision making, notably regarding the decision to perform immediate surgery or to give neoadjuvant chemotherapy followed by surgery. Better characterization of these molecular alterations is needed to develop techniques for early detection, identify new molecular targets, and devise novel targeted therapies. Several transcriptional classifications of PDAC have been associated with different molecular pathways (5–7). Among these different PDAC molecular subtypes, the validity of the ADEX and exocrine subtypes has been questioned. Indeed, results from Maurer and colleagues failed to provide evidence supporting the existence of the ADEX and exocrine subtypes, suggesting instead they were artifacts arising from normal pancreatic tissues within bulk tumor samples (8). Here, we have defined cancer- and stroma-specific PDAC molecular subtypes based on RNA-seq analysis of laser capture microdissected malignant epithelium and matched adjacent stroma. This study identified four subtypes of cancer (C1–C4) and three subtypes of stroma (S1–S3). The C1 and C3 subtypes were associated with the classical/pancreatic progenitor subtype. The C2 subtype was associated with the ADEX/exocrine-like subtype, supporting the notion that this is a genuine subtype of PDAC. The C4 subtype, which showed worse DFS, was associated with the squamous/basal-like/quasi-mesenchymal subtype.
Numerous studies have implicated the tumor stroma in regulating the initiation, progression, and metastasis of PDAC (10–13). Moffitt and colleagues, was the first to “separate” the stromal signature from the pancreatic tumor by virtual microdissection. “Normal” and “activated” stroma were identified in this study, the latter showing worse prognosis (7). More recent work from Maurer and colleagues using microdissected samples merged some of the previously reported stromal signatures into two signatures, namely, the “activated stroma,” which is enriched with ECM genes (corresponding to Moffitt activated stroma subtype) and the “normal stroma,” enriched with immune genes (corresponding to Bailey immunogenic subtype suggesting antitumor T-cell activation; ref. 8).
Although PDAC tumors generally contain an abundant desmoplastic stroma, the amount of stroma between cancer glands is variable. Because recent studies have suggested different populations of CAFs based on their proximity to cancer cells (22), we focused our analysis on stroma situated within 280 μm of the malignant epithelium and identified three stroma subtypes (S1–S3), with S1 associated with better prognosis. C1 and C2 cancer cells associated with S1 stromal signatures had the highest DFS relative to those with S2 and S3 signatures. Of note, the S2 subtype was enriched with immune genes related to antigen presentation but not T cells, highlighting the importance of precise subtyping. No difference in DFS was found between the other pairs of cancer- and stroma-specific subtypes.
A major challenge we face is the need to improve the current pool of prognostic factors to aid in therapeutic decision making. Given the variations in survival within the AJCC clinical stages and the large genomic heterogeneity within PDAC tumors, prognostic gene expression signatures may be the ideal tool to better predict patient outcomes. To this day, gene expression profiling remains the most promising and successful high-throughput molecular approach to identifying new prognostic factors in PDAC. Here, we analyzed gene expression profiles of PDAC samples from surgically resected patients and identified a robust 13-gene classifier associated with postoperative OS that is independent of classical prognostic factors. Some of these genes have already been described as independent prognostic factors in several cancers, including PDAC: MUC16, KLK6, PSCA, and KRT17 (23–27).
Next, we developed a two-color RNA-ISH assay with pooled probes for genes associated with LTS (AP5M1, TCP1, and PNP) and genes associated with STS (MIA, MUC16, and ADGRF1) to highlight the clinical utility of our gene signature with regard to survival. Among the upregulated genes in the STS subgroup, MUC16 is a cell surface glycoprotein that plays a role in promoting cancer cell growth in ovarian cancer (28). Products of this gene have been used as markers for different cancers, with higher expression levels associated with poorer outcomes (29–31). ADGRF1 (formerly named GPR110) is a cell surface protein that is overexpressed in lung cancer (32, 33). ADGRF1 mRNA expression was positively correlated with E-cadherin (CDH1) and negatively correlated with vimentin and N-cadherin (CDH2), suggesting a potential role of ADGRF1 in the epithelial–mesenchymal transition process (34). Among the downregulated genes in the STS subgroup, AP5M1 is a potent apoptosis-inducing molecule in cervical cancer cells. AP5M1 upregulates the level of BAX protein, a key proapoptotic B-cell lymphoma (BCL)-2 family member regulating mitochondrial apoptotic cell death pathway (35). T-complex 1 (TCP-1) is involved in the folding of a variety of proteins important for cell proliferation and might serve as a prognostic marker in a variety of cancers (36–39).
The small number of genes should facilitate the clinical application of our 13-gene classifier and RNA-ISH biomarker for testing in prospective trials of adjuvant and neoadjuvant chemotherapy. If validated, our signature could help select patients with resectable disease for either immediate surgery (for the predicted LTS subgroup) or neoadjuvant combination chemotherapy and radiation (for the predicted STS subgroup) followed by surgery, and ultimately would affect the outcome and quality of life of our patients.
Our results identified specific signatures from the epithelial and the stromal components of PDAC. Theses signatures defined specific subtypes associated with different DFS. By improving the characterization of PDAC, our findings also add clarity to the heterogeneous nature of molecular subtypes in PDAC and expand our understanding of global transcriptional programs in the stroma. In the era of precision medicine, fine tumor characterization such as the one described will be required for improved personalized treatment selection.
Authors' Disclosures
M. Mino-Kenudson reports personal fees from H3 Biomedicine and AstraZeneca, and grants from Novartis outside the submitted work. D.T. Ting reports grants from ACD-Biotechne during the conduct of the study as well as personal fees and other from Rome Therapeutics; other from PanTher Therapeutics and TellBio, Inc.; personal fees from Pfizer, Foundation Medicine Inc., Third Rock Ventures, EMD Millipore Sigma, and NanoString; and grants from PureTech Health and Ribon Therapeutics outside the submitted work. A.S. Liss reports grants from Novartis Institutes for BioMedical Research, personal fees from Third Rock Ventures, and other from Constellation Pharmaceuticals outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
D.J. Birnbaum: Formal analysis, investigation, writing–original draft. S.K.S. Begg: Investigation, writing–original draft. P. Finetti: Software, formal analysis, investigation, writing–original draft. C. Vanderburg: Resources, supervision, methodology, writing–review and editing. A.S. Kulkarni: Investigation, writing–review and editing. A. Neyaz: Investigation, writing–review and editing. T. Hank: Investigation, writing–review and editing. E. Tai: Data curation, methodology, writing–review and editing. V. Deshpande: Resources, writing–review and editing. F. Bertucci: Supervision, writing–review and editing. D. Birnbaum: Supervision, writing–review and editing. K.D. Lillemoe: Resources, funding acquisition, writing–review and editing. A.L. Warshaw: Resources, funding acquisition, writing–review and editing. M. Mino-Kenudson: Investigation, methodology, writing–review and editing. C. Fernandez-Del Castillo: Conceptualization, resources, funding acquisition, writing–review and editing. D.T. Ting: Resources, methodology, project administration, writing–review and editing. A.S. Liss: Conceptualization, resources, formal analysis, supervision, funding acquisition, methodology, project administration, writing–review and editing.
Acknowledgments
A.S. Liss is supported by the NIH Grant R03CA241971 (to A.S. Liss) and the National Institute of Standards and Technology (NIST) Grant 60NANB20D166 (to A.S. Liss). D.T. Ting, V. Deshpande, A. Neyaz, and A.S. Kulkarni receive sponsored research support from ACD-Biotechne. D.T. Ting is supported by the NIH Grant R01CA235412 (to D.T. Ting), the NSF PHY-1549535, an SU2C-NSF-Lustgarten Foundation Pancreatic Cancer Convergence Research Team grant, and a Stand Up To Cancer-Lustgarten Foundation Pancreatic Cancer Interception Translational Cancer Research Grant (Grant Number: SU2C-AACR-DT26–17). D.J. Birnbaum was supported by a fellowship from the ARC Foundation (https://www.fondation-arc.org/). T. Hank was supported by the German Cancer Aid Foundation (Mildred-Scheel-Postdoc-Program, Deutsche Krebshilfe, 70112880, to T. Hank). Stand Up To Cancer is a division of the Entertainment Industry Foundation. SU2C-AACR-DT26-17 grant is administered by the American Association for Cancer Research, the scientific partner of SU2C.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Footnotes
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
Clin Cancer Res 2021;27:2314–25
- Received March 24, 2020.
- Revision received November 22, 2020.
- Accepted February 1, 2021.
- Published first February 5, 2021.
- ©2021 American Association for Cancer Research.