The Spatiotemporal Evolution of Lymph Node Spread in Early Breast Cancer.

Purpose: The most significant prognostic factor in early breast cancer is lymph node involvement. This stage between localized and systemic disease is key to understanding breast cancer progression; however, our knowledge of the evolution of lymph node malignant invasion remains limited, as most currently available data are derived from primary tumors.Experimental Design: In 11 patients with treatment-naïve node-positive early breast cancer without clinical evidence of distant metastasis, we investigated lymph node evolution using spatial multiregion sequencing (n = 78 samples) of primary and lymph node deposits and genomic profiling of matched longitudinal circulating tumor DNA (ctDNA).Results: Linear evolution from primary to lymph node was rare (1/11), whereas the majority of cases displayed either early divergence between primary and nodes (4/11) or no detectable divergence (6/11), where both primary and nodal cells belonged to a single recent expansion of a metastatic clone. Divergence of metastatic subclones was driven in part by APOBEC. Longitudinal ctDNA samples from 2 of 7 subjects with evaluable plasma taken perioperatively reflected the two major evolutionary patterns and demonstrate that private mutations can be detected even from early metastatic nodal deposits. Moreover, node removal resulted in disappearance of private lymph node mutations in ctDNA.Conclusions: This study sheds new light on a crucial evolutionary step in the natural history of breast cancer, demonstrating early establishment of axillary lymph node metastasis in a substantial proportion of patients. Clin Cancer Res; 24(19); 4763-70. ©2018 AACR.


Introduction
Breast cancer is characterized by high genomic and transcriptomic diversity, both between (1-3) and within patients (4)(5)(6)(7)(8). This inherent complexity is fully consistent with a clonal evolution model of cancer (9,10). The cancer evolution paradigm provides a biological explanation of experimental observations and may also lead to more accurate predictions of the future course of the disease, in particular prognostication and the emergence of treatment resistance (9).
Currently, clinicopathologic parameters such as age, tumor grade and stage, ER expression, and HER2 expression have been integrated into scoring systems to estimate the probability of recurrence and death from breast cancer (11,12). Moreover, gene expression profiles provide additional prognostic and/or predictive information regarding adjuvant chemotherapy in ER-positive early breast cancer, and their clinical utility is being prospectively evaluated in large randomized clinical trials (13). Large meta-analyses have indicated that in early breast cancer, the most important prognostic factor is lymph node involvement (14)(15)(16). This clinical stage represents a potentially intermediate evolutionary step between localized disease and metastatic dissemination, and it is therefore of crucial importance to understand progression. Micrometastases can also be present at diagnosis of some early breast cancers, and ultrasensitive methods to analyze circulating tumor DNA (ctDNA) have recently helped interrogate such deposits that can subsequently result in overt metastatic recurrence (17). Hence, combined genomic analyses of primary, lymph nodes, and ctDNA are necessary to understand metastatic progression in cancer-also in light of findings in other cancer types where metastatic dissemination was found to be decoupled from lymphatic spread in a subset of cases (18,19).
Here, we sought to study lymph node spread from an evolutionary perspective by analyzing 78 multiregion samples taken from untreated primary tumors and lymph nodes, as well as 7 longitudinal ctDNA samples of a selected cohort of 11 primary breast cancers that had biopsy-proven ipsilateral axillary lymph node spread without clinical evidence of distant metastatic disease (see Fig. 1 and Supplementary Table S1 for clinical details). We used whole-exome sequencing (WES), whole-genome sequencing (WGS), and targeted deep sequencing data, combined with phylogenomics analysis, to understand the dynamics of lymph node spread.
We found evident patterns of early divergence between primary and lymph node deposits in a subset of patients in our cohort and showed that these patterns were reflected in ctDNA. We also found that APOBEC activity contributed to such early divergence. Finally, we show proof-of-principle loss of ctDNA mutations private to nodes following surgical resection.

Patient cohort and samples
Samples were collected from 11 patients with breast cancer with positive axillary nodes. Patients had not received any treatment prior to surgery. The median age of patients in this cohort was 56 years (range, 38-81). Several lymph nodes and primary tumor specimens were collected from each patient. Samples were either paraffin embedded after formalin fixation or snap frozen immediately after resection.
Whole peripheral blood was collected from each patient for germline DNA and plasma ctDNA taken at four time points: (1) intraoperatively before tumor resection, (2) intraoperatively immediately after tumor resection, (3) 4 hours postoperatively, and (4) 10 to 14 days postoperatively, at the follow-up visit. Blood at each time point (1-4) was collected into 3 Â 10 mL EDTA or Streck tubes and was centrifuged at 1,600 Â g for 20 minutes for a single spin. Plasma and buffy coat were collected and stored at -80 C. ctDNA was extracted from 5 mL plasma using the QIAamp circulating nucleic acid kit (catalog number 55114) from QIAGEN according to the manufacturer's instructions. Briefly, plasma was lysed with proteinase K and ACL for 30 minutes at 60 C, with carrier RNA in AVE added. Buffer ACB was added, and the sample was passed through a QIAamp mini spin column to bind the DNA. DNA was washed with ACW1, ACW2, and 100% ethanol before centrifugation at 14,000 RPM for 3 minutes, dried for 10 minutes, eluted into 50 mL AVE buffer, and stored at -20 C. Extracted DNA was quantified by digital droplet PCR (ddPCR) using a TaqMan copynumber reference assay for RNAse P (Life Technologies). ddPCR reactions were assembled using 1 mL of eluate and 10 mL of ddPCR supermix for probes (Bio-Rad Laboratories) for a total reaction volume of 20 mL. The reaction was partitioned into approximately 20,000 droplets on a Bio-Rad Laboratories QX200 droplet generator. PCR of the emulsified reaction was performed in 96-well plates on a G-Storm GS4 thermocycler for 40 PCR cycles with 60 C annealing temperature. The plates were read on a Bio-Rad Laboratories QX200 droplet reader, and the DNA concentration was calculated using Quantasoft software (Version 1.4.0.99). At least 2 NTC wells were included in each quantification run.
Clinical and histopathologic data from the patient cohort can be found in Supplementary Table S1. The study protocol was approved by an institutional research ethics committee (reference number 13/LO/1015). All patients gave their written informed consent to participate before enrolling in the study. The study was carried out in accordance with the principles of the International Conference on Harmonization Guideline for Good Clinical Practice and the Declaration of Helsinki. See Supplementary Material and Methods for details on sample preparation.

WES, WGS, and targeted sequencing
For each of the 11 patients in our cohort, 500 ng of DNA from 2 primary breast tumor specimens and 1 to 5 involved lymph nodes

Translational Relevance
Lymph node involvement is an important prognostic factor in breast cancer, but the patterns of cancer evolution during lymph node spread are poorly understood. Furthermore, circulating tumor DNA as a potential biomarker to guide treatment is relatively unexplored in this context. In this prospective study, we used multiregion sequencing of multiple samples of primary tumors and multiple matched lymph nodes and showed remarkable differences in the patterns of lymph node infiltration between subsets of patients. Specifically, one subgroup of cases showed early lymph node divergence. Those patterns were reflected in the circulating tumor DNA of the same patients and demonstrated (to our knowledge for the first time) disappearance of private nodal mutations after surgical resection. Divergence was also associated with APOBEC activity. Together, these results suggest that evolutionary patterns of lymph node infiltration may be important to predict the course of the disease in individual patients and, combined with circulating tumor DNA sampling, could aid patient stratification and personalized, precision medicine.
were whole-exome sequenced (SureSelect human all exon V2). The set comprised 40 fresh-frozen (FF) tissue samples (20 primary tumor and 20 lymph node), 2 formalin-fixed paraffin-embedded (FFPE) specimens (both primary tumor tissues), and 11 germline samples (buffy coats). Exome sequencing data had a mean coverage of 154X. Whole-genome libraries were prepared from 30 to 100 ng of genomic DNA with the NebNext Ultra II kit following the manufacturer's instructions. Genomic DNAs were sheared in a Diagenode sonicator prior to library preparation. Whole-genome median coverage was 38. Further, a total of 807 exonic single-nucleotide variants (SNV) were selected for targeted validation. All but two samples used for WES were included in the targeted validation panel. In addition, we included DNA from 36 FFPE manually microdissected specimens and 7 ctDNA samples. A custom SureSelect XT2 panel (Agilent Technologies) was used to generate targeted capture libraries from these 83 samples (for 2 samples, there was not enough DNA for targeted sequencing; therefore, information from only the exome sequencing was used) following the manufacturer's recommendations. Mean coverage for targeted sequencing was 1,813X with 98% validation. All libraries were sequenced on an Illumina HiSeq 2500. See Supplementary Material and Methods for details regarding bioinformatics analysis.

Results
Intratumor heterogeneity in lymph node-positive breast cancers Using WES, we profiled 40 FF and 2 FFPE samples from the 11 patients, as well as matched normal (FF buffy coat), obtaining a mean depth of 154X. For each patient, we had at least two regions from the primary tumor, taken 1 to 6 cm apart, and one lymph node (Fig. 1). Extensive intratumor heterogeneity (ITH) was evident in our cohort, with an average of 73.5% of variants considered to be subclonal (Supplementary Figs. S1 and S2). A total of 807 mutations were selected for custom targeted deep sequencing validation (mean depth of 1,813X, 98% validation rate). We also applied the same panel to 36 additional FFPE samples from the same patients (all those available with >50% tumor content, see Supplementary Table S2), confirming that the original FF samples were representative of ITH both in the primary and the lymph node deposits, and the observed patterns were not due to sampling bias ( Fig. 2A). Estimated purity, ploidy, and copy-number profiles (Fig. 2B) were used to calculate CCFs for both WES and targeted sequencing profiles, as presented in Fig. 2A and Supplementary Fig. S2 (see Supplementary  Tables S3-S6 for values). For those samples where only targeted sequencing was available (i.e., additional FFPE samples), purity and ploidy estimates could not be calculated, and we therefore reported presence/absence of the mutations (e.g., Fig. 2A, FFPE samples are set to CCF ¼ 1/0). As macrodissected samples represented a small localized region of the tumor, we did not find any evident subclonal structure within each sample, although the limited mutational burden of breast cancer, combined with exome sequencing, precluded reliable subclonal analysis within such samples.
The mutational landscape of our cohort was consistent with previous studies (2,3,6), with TP53 and PIK3CA being the most commonly mutated drivers ( Fig. 2A: tier 1 cancer genes, most likely drivers in black; tier 2 cancer genes, possibly drivers  but uncertain pathogenicity in gray). Copy-number alterations (CNA) were widespread, with patterns consistent with the profile of primary tumors (1, 3), such as 1q and 8q gains and 8p loss ( Fig. 2B and Supplementary Fig. S3). Copy-number profiles were confirmed by WGS (performed only for patients 3 and 4; e.g., Fig. 2C). Mutations in tumor-suppressor genes frequently co-occurred with LOH, consistent with the inactivation of the gene. Thus at the genomic level, our cohort was consistent with other cohorts of early breast cancers (1)(2)(3).

Distinct modes of lymph node evolution
The combination of point mutations, indels, and CNAs clearly identified two major evolutionary patterns.  Evolutionary trajectories during lymph node invasion. A, Phylogenetic trees reconstructed with maximum parsimony for each patient illustrate the patterns of lymph node spread. For patients 3 and 4, results were validated with WGS. When multiple lymph node samples were available for a patient, those clustered together in a single clade, indicating a recent common ancestor that led to lymph node colonization. See Supplementary Fig. S4 for tree bootstrap values. Putative driver genes and recurrent CNAs in breast cancers are annotated in the trees (tier 1, likely driver genes, in bold, black type; recurrent CNAs; and tier 2, possible drivers of uncertain pathogenicity, in italic, gray type). B, We assessed the spatial heterogeneity of subclonal mutation PIK3CA 1047R in patient 10 at single-cell resolution using ISH of mutant versus wild-type transcripts, revealing spatial segregation of mutant and wild-type subclones (only signals from cancer cells are represented). Pat., patient. Sequential evolution where lymph node metastasis originated from a localized subclone in the primary was rare (1/11-only patient 11; Fig. 2A). This was confirmed using phylogenetic analysis. In this case, the tumor phylogenetic tree showed expansion of a metastatic cancer lineage that originated within region RB and spread to both LN1 and LN2. This is shown in the tree as RB, LN1, and LN2 having a recent common ancestor and forming a clade distinct from RA (Fig. 3A). Additional unique mutations within the clade are most likely passengers.

Pat
The two predominantly observed patterns were early divergence or a complete lack of divergence. Early divergence between primary and lymph nodes was observed in patients 2, 3, 4, and 6. In these cases, the mutational and in part the CNA landscapes were very different between the primary and the lymph nodes, with significant heterogeneity at the level of putative driver alterations. Phylogenetic analysis revealed that the lymph node deposits diverged very early during the evolutionary history of the tumor in these patients ( Fig. 3A; see Supplementary Fig. S4 for bootstrap values and Supplementary Fig. S5 for WES trees). This is particularly interesting because recent breast cancer studies found similar patterns of divergence between primary and metastatic lesions (20,21). In our cohort, multiple samples from the lymph nodes were also very similar to each other, consistent with a recent common ancestor of the lymph node lesions, which indicates a clonal bottleneck. The fact that additional samples profiled with targeted sequencing corroborated the original phylogenetic topology constructed with WES ( Supplementary Fig. S5) confirms that divergence patterns were not due to sampling bias. This is important because phylogenetic divergence could appear simply due to undersampling of the lineages in the primary tumor (22). Furthermore, early divergence was confirmed by WGS in patients 3 and 4 (Figs. 2C and 3A). To test the impact of possible subclonal structure that may confound the phylogenies (23), we also reconstructed the phylogenetic trees with only clonal mutations in each sample with CCF >80% [using the most recent common ancestor (MRCA) in each sample], and the topologies were unchanged ( Supplementary  Fig. S6), again highlighting early divergence in a subset of cases.
The rest of the cohort (patients 5, 7, 9, 10, 14, and 16) was characterized by a palm-tree topology, with relatively short branches and no detectable divergence between primary and lymph node lesions (Fig. 3A). Putative drivers and recurrent alterations in this subgroup were almost invariably truncal (all apart from PIK3CA in patient 10, as also reported by others; ref. 6). We investigated the spatial heterogeneity of PIK3CA 1047R in primary and lymph node lesions of patient 10 further using single-cell level chromogenic ISH with BaseScope ( Fig.  3B; primary: 44.17% mutant, lymph node: 80.96% mutant; signal from cancer cells only is reported), demonstrating segregation of PIK3CA mutant and wild-type subclones. Divergent patterns were quantified using Node Cophenetic Distance (24) and confirmed significant divergence measured both on targeted exome sequencing (P ¼ 0.0043, Wilcoxon rank-sum test) and WES (P ¼ 0.0079, Wilcoxon rank-sum test) data (Supplementary Fig. S7). Divergence was not correlated with number of samples.
Importantly, we found no evidence of genes recurrently altered in lymph nodes with respect to the primary lesions, although we cannot exclude the presence of weakly recurrent drivers that we do not have the power to detect in our cohort.

Different modes of lymph node spread are recapitulated in ctDNA
In order to follow the evolutionary dynamics of node-positive early breast cancers through time, we collected cell-free DNA at multiple time points on 11 patients; however, only 7 had enough DNA (20 ng in total) to allow genomic profiling. We applied the targeted sequencing custom panel used in Fig. 2A to those 7 patients, but somatic mutations were detected in only 2. For these 2 patients, we had four time   points: pre-op, immediately after resection, 4 hours post-op, and 12 to 14 days post-op. Genomic profiling shows the dramatic impact of tumor resection on the ctDNA, as proof of principle indicating that the resected lesions were responsible for shedding detectable tumor DNA in the plasma (Fig. 4A). Indeed, the frequency of mutations dramatically drops after tumor (primary and nodes) resection (Fig. 4B). However, patient 6 showed mutations increasing again 14 days after the operation. Remarkably, the majority of private mutations found in the ctDNA samples before the operation were unique to the lymph nodes, corroborating the divergence patterns observed in solid samples. After lymph node resection, private mutations from the nodes disappeared from plasma, confirming the origin of the shedding. A subset of truncal mutations, however, persisted in the plasma 14 days after the operation.
This was unlikely due to ctDNA remnants due to its short halflife and instead suggests the presence of residual micrometastatic disease shedding ctDNA in the blood. For patient 16, the lack of divergence reported in the tissue was observed in plasma as well. Phylogenetic reconstruction confirmed these patterns for both patients. In particular, for patient 6, the preoperative ctDNA profile clustered with the lymph node sample, whereas the postoperative ctDNA sample showed an earlier divergence event of the micrometastatic disease. Because we do not know which mutations are private to the micrometastatic deposits, the postoperative ctDNA branch appears shorter in our data than it actually is. These results indicate that the patterns of lymph node spread observed in the tissue, even in this early node-positive cohort, are recapitulated in the plasma.

APOBEC activity is increased in lymph nodes
Mutational signature analysis revealed the presence of a common age-related cancer signature 1, as well as signatures specific to breast cancer (ref. 25; signatures 2, 3, and 13). Interestingly, APOBEC signatures, which were detected in 5 of 11 patients, were found to be increased in lymph nodes with respect to the primary tumor, especially in the divergent subgroup. These results were confirmed using WGS on patients 3 and 4 (Fig. 5A). RNA ISH of APOBEC3A and APOBEC3B transcripts for patient 14 using RNAScope ( Fig. 5B and C) revealed spatial heterogeneity, with 3.17-fold higher expression of APOBEC3A and 3.72-fold higher expression of APOBEC3B in the lymph node with respect to the primary tumor (measured in dots/mm 2 , only signal from cancer area is reported, see "Materials and Methods"). Automatic identification of lymphocytes indicated that APOBEC signal came predominantly from cancer cells, although in some areas, cell density was so high that the two signals overlapped (purple). This was consistent with the mutational signatures in Fig. 5A. This suggests that APOBEC may be involved in driving ITH during metastatic spread to lymph nodes.

Discussion
Breast cancer can spread from one organ system to another via hematogenous and lymphatic routes. Understanding lymph node spread from an evolutionary perspective is crucial to improve the understanding of progression to metastatic disease. In this study, we focused on untreated lymph nodepositive patients without evidence of distant deposits and performed a spatiotemporal analysis of the evolution of lymph node invasion. We found striking patterns of early divergence in a significant proportion of patients. Remarkably, ctDNA analysis identified the divergent lymph nodes as the main contributor to ctDNA at resection, thus reflecting the evolutionary patterns identified in the tissue. This implies that ctDNA may partially inform on the biology of axillary lymph node spread. The divergent lesions were highly distinct in terms of mutations and partly in terms of copy-number changes, suggesting a clonal bottleneck during lymph node spread in these patients. Importantly, our data are consistent with a model of punctuated evolution in breast cancer, where tumorigenesis is driven by relatively rare but dramatic selection events (7). Moreover, from a therapeutic perspective, inhibiting APOBEC may prevent or slow down metastatic evolution. The question on whether evolutionary patterns such as lymph node divergence have prognostic and/or predictive value remains open and will require testing in larger cohorts.
This study has several limitations, for example, the limited number of patients and the lack of ctDNA longitudinal tracking beyond day 14 after surgery. Moreover, the follow-up is relatively short, and, to date, we have not had the opportunity to profile distant metastatic deposits to understand the representation of the clones from primary and nodes in the three subgroups. Further efforts on larger cohorts of patients are needed to validate the three subgroups, and to determine their utility in determining patients' prognosis in addition to already established prognostic factors and potentially direct more effective treatment strategies.

Data and Materials Availability
Sequence data have been deposited at the European Genomephenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001002947. Further information about EGA can be found on https://ega-archive.org. Highresolution images have been deposited in BioStudies with accession number S-BSST110.

Disclaimer
The findings, opinions, and recommendations expressed here are those of the authors and not necessarily those of the universities where the research was performed, the National Institutes of Health or the Arizona Department of Health Services, Arizona Biomedical Research Centre.