Scores of genetically engineered mice have been generated in the quest to understand mechanisms of breast cancer development and progression. More recently, there has been a growing trend for using such models for testing various therapeutic strategies and agents. The application of these mouse models for these purposes requires that they be characterized in ways that demonstrate they possess important similarities to human breast cancer. In particular, detailed comparisons of the features of the models to human breast cancer must include attention to the histological phenotypes, chromosomal and molecular alterations, and the predictive value of the models for preclinical testing. Whereas these models have become important tools for the study of breast cancer, the great majority of existing mouse mammary cancer models develop tumors that are estrogen receptor negative, with relatively few models demonstrating metastatic spread to the lungs, and none developing metastases to bone. This review focuses on recent studies using genomic approaches to further understand the oncogenic processes occurring in mouse models of mammary cancer and to compare these changes with those identified in human breast cancer. Gene expression profiling is being applied to help define pharmacological responses that occur in vivo. Detailed genomic analyses will provide important information for selecting models for specific experimental purposes, contribute to the understanding of oncogene-specific expression signatures and potential therapeutic targets, and further define mechanisms of chemoprevention and chemotherapy.
Enormous insights into molecular pathways involved in breast cancer have been harvested from the multitude of genetically engineered mouse models of mammary cancer that have been created over the past two decades (for recent reviews, see Refs. 1, 2, 3, 4, 5 ). Initially, models were generated through the targeted overexpression of a potential oncogene to the mammary gland using a promoter sequence that directed transcription to the mammary epithelium (6, 7, 8) , although expression of the transgene was often observed in other tissues as well. This targeted expression to the mammary gland has been possible primarily through the use of only a handful of promoters: the mouse mammary tumor virus long terminal repeat (MMTV; Ref. 7 ); the whey acidic protein promoter (WAP; Ref. 9 ); and the bovine lactoglobulin promoter (BLG; Refs. 10 and 11 ).
This general approach has been successful in determining what phenotypic response occurs in the mammary glands when a particular gene is overexpressed in multiple epithelial cells and, indeed, has provided strong evidence that many genes could function as oncogenes by generating malignant lesions in the mammary glands. On the other hand, overexpressing genes in this manner is probably not very physiological but rather represents a supraphysiological level of expression that may not occur during the process of oncogenesis in human breast cancer. Additionally, the developmental timing of expression of the transgene is determined primarily by the promoter, although it is influenced to some degree by the random chromosomal insertion site of the transgene. Further confounding this approach is the fact that the promoters most widely used for targeted expression to the mammary glands are responsive to hormonal stimulation, which does not allow for determination of how hormones may alter the physiological milieu of the mammary gland without altering levels of transgene expression.
Over the past decade, technologies allowing more precise control of genetic modulation of the mammary glands have been developed that provide ways to better overcome many of these experimental problems. Inducible systems of transgene expression have been used to determine what effects the overexpression of transgenes may have at different developmental time points (12) . When coupled with powerful techniques involving genomic modifications using the loxP and cre recombinase methods of homologous recombination, additional levels of gene regulation can be manipulated to turn on or turn off gene expression or to introduce specific gene mutations at particular developmental time points (13) . The use of cre recombinase technologies, especially in the context of conditionally introducing mutations in targeted tissues, has led to the development of important models of mammary cancer that are extremely relevant to certain forms of human breast cancer.
The abundance of mammary cancer models provides the scientific community with many choices to address experimental questions. The transgenic expression of particular genes or mutants may provide an in vivo model for testing compounds against a specific molecular target or pathway. Some models may be particularly useful for studying stages of tumor progression and stage-specific responses to chemopreventive or chemotherapeutic compounds. However, the value of these mouse models for predicting the response of compounds in human breast cancer patients remains generally untested and unknown. Whereas such testing will initially remain highly empiric, we propose that by fully characterizing the gene expression patterns of these models (and eventually the proteome, as well) and correlating these patterns with therapeutic responses to classes of compounds, a predictive matrix of pharmacological responses may be developed. This will provide a more rational approach to identifying appropriate models for preclinical testing.
To accomplish this goal, our laboratory has begun to survey the gene expression profiles of several important mouse models of mammary cancer to distinguish their signatures of gene expression based on the oncogenes that have initiated the malignant transformation (14 , 15) . More recently, we have begun to use expression profiling to identify transcriptional changes that occur in response to particular chemopreventive and therapeutic agents. These studies should help expand our knowledge of how particular agents exert their effects in vivo.
Materials and Methods
Mammary Gland Tumor Samples.
To survey gene expression characteristics of mammary tumors that develop as the result of several different oncogenic pathways, multiple models have been used. Mammary tumors from the following transgenic models were collected when tumors reached approximately 1 cm in diameter: MMTV-myc (7) ; MMTV-ras (6) ; MMTV-polyoma virus middle T antigen (PyMT; Ref. 16 ); MMTV-HER2/neu (8) ; WAP-SV40 T/t antigens (17) ; and the rat C3(1) prostatein-SV40 T/t antigens (18) . All of these transgenic mice were in the FVB/N genetic background. Three additional gene knockout (KO) models have also been analyzed: a p53 KO mammary transplant model (19) ; a p53 conditional KO model;1 and a conditional BRCA1 KO/p53+/− model ,(20) . The p53 transplant model was in a Balb C background, whereas the p53 conditional KO and BRCA1−/−;p53+/− models were in mixed genetic backgrounds. The natural history, histological characteristics, and some molecular features of these models have been discussed in previous articles and are beyond the scope of this review. The reference RNA used for all array experiments consisted of RNA extracted from pooled mammary glands from 10–20 virgin female mice that were 12 weeks of age. Pooling of samples was performed to help control for variations in the stages of mammary gland differentiation depending on the period of the estrous cycle. Additionally, epithelium from mammary fat pads was cleared, and RNA was extracted from the remaining fat and stromal tissues to help identify genes that were primarily expressed by this component of the mammary gland.
To determine whether changes in gene expression could be identified in mammary glands from mice treated with the cyclooxygenase-2 inhibitor celecoxib, C3(1)-T/t-antigen transgenic mice were given either vehicle or feed containing celecoxib beginning at 6 weeks of age. Mammary glands were removed from both control and treated mice at 13 weeks of age, when preinvasive mammary intraepithelial lesions were observed to the same extent in both groups of mice.
RNA was extracted from tissue samples using the guanidine isothiocynate extraction method, and RNA integrity was determined by gel electrophoresis or analysis using the Lab-on-a-Chip Bioanalyzer (Agilent, Palo Alto, CA). Probes were labeled using the direct incorporation of fluorochromes (Cy3 and Cy5), and competitive hybridizations using cDNA microarrays produced by the National Cancer Institute Advanced Technology Center and the Frederick Cancer Research and Development Center Microarray Core were used for these studies, as described previously (15) . Over the course of these experiments, three separate arrays were used, which included a mouse cancer gene array with approximately 3,000 features, an array with approximately 8,700 features containing the Incyte mouse GEM1 feature set, and the Incyte mouse GEM2 feature set containing approximately 10,000 features. Datasets from each array were analyzed separately. At least five separate tumors from five individual mice were used for analyses with each array version. At least one experiment for each tumor category was performed in which the fluorescent labeling was reversed between the tumor sample and the reference RNA (Cy5 and Cy3, respectively) to determine the degree of bias of dye incorporation that is commonly observed. Statistical analyses were performed as described previously (15) to identify sets of genes whose expression was altered in the majority of the tumor models as well as to identify genes whose expression tended to be specific for individual models or groups of models. The later was performed using an F-test approach (ANOVA, analysis of variance).
Verification of Microarray Results Using Secondary Methods.
Several genes with interesting expression patterns identified by microarray analyses were further analyzed by Northern blot, semiquantitative reverse transcription-PCR, or Western blot analysis to confirm the microarray result.
Results and Discussion
Two types of gene lists were generated from our microarray analyses: (a) a list of “cancer genes” whose expression changed by at least 2-fold and tended to be similar among most of the mammary tumors analyzed; and (b) a list of “tumor signature genes” whose expression varied significantly between the different models (Fig. 1<$REFLINK> ; Ref. 15 ).
The “cancer gene” list from the GEM1 array dataset for the transgenic tumors contains approximately 900 genes representing all categories of cellular functions. Genes in this list may include regulators of essential functions for maintaining a transformed and highly proliferative state and likely includes common markers for mouse mammary cancer and potential targets for immunotherapy. Interestingly, well over half of the genes represented in this group represent expressed sequence tags of uncharacterized genes. This suggests that there are likely many genes involved in oncogenesis whose function remains to be determined. As might be expected, using the set of “cancer genes” to probe relationships of the tumors as determined by Eisen hierarchical clustering led to individual clusters for each tumor type. However, there remained a high degree of correlation between the tumor types (>0.8).
It must be kept in mind that the comparison of gene expression between the tumor and the normal mammary gland (represented by a ratio as determined by the array analysis) is significantly biased by the relative composition of the cellular components of the samples. The normal mouse mammary gland is composed of relatively few epithelial cells and a large amount of adipose and stromal cells. The mammary tumors evaluated in these studies are highly enriched for epithelial cells. Therefore, care must be taken in interpreting results from such comparisons, at least in concluding exactly what has changed between the normal mammary gland and the tumor. The composition of the normal reference RNA, however, is not critical for determining how different tumors vary from each other because they can be compared with each other using an arbitrary standard that contains a high number of RNA species whose genes are represented on the microarray. To further distinguish genes that might simply represent highly expressed genes from the fat and stroma, expression profiles from cleared mammary fat pads were compared with the normal mouse mammary gland reference RNA. The resulting list of genes is considered to contain a large portion of genes expressed primarily in the nonepithelial portion of the mammary gland, which is useful for helping to interpret changes in gene expression in the tumors relative to the normal mammary gland (15) .
The list of “tumor signature” genes appears to provide insights into how different oncogenes lead to the evolution of mammary cancers with somewhat specific molecular characteristics. When F-test analyses were applied to the GEM1 array datasets for the transgenic tumors, a set of approximately 900 genes was identified, again with a large number of “unknown” genes contained in this list (Fig. 1)<$REFLINK> . When hierarchical clustering was performed using this gene list, tumor models clustered separately and with relatively low degrees of correlation between the different tumor types. The first major separation of tumors types distinguished the nuclear oncogenes myc and T/t-antigens from the oncogenes that use extranuclear signaling pathways (HER2/neu, PyMT, and ras). This indicated that the expression of many of the genes in this list was specific to individual tumor types or groups of tumor types. Clusters of genes with such patterns were clearly identified within the pattern of hierarchical clustering.
The largest set of genes unique to a particular tumor type was that observed for the SV40 T/t-antigen cluster, where almost 20% of the “tumor signature” genes (about 200 genes) were specifically expressed in tumors from the T/t-antigen models. The T/t-antigen cluster contained genes representing all cellular pathways, demonstrating the tremendous disturbance in normal cellular regulation that results from the overexpression of these viral oncoproteins. The next largest cluster of genes was shared by both the T/t-antigen tumors and the myc tumors, suggesting that common pathways related to oncogenesis were involved for these two oncogenes. Relatively few genes were identified that were specific for the other mammary tumor models. The other tumor models, HER2/neu, PyMT, and ras, clustered together and shared a set of 62 genes whose expression was similar. Given that the signaling pathways for HER2/neu and PyMT may converge through ras signaling, this result is not surprising.
These results demonstrate that sufficient expression differences could be identified to distinguish end-stage mammary tumors that were developed through different initiating oncogenic insults, although the great majority of expression changes were more similar than different. This suggests that the overall molecular derangements in tumors from these different models shared many changes necessary for tumors to arise, yet each tumor type or groups of tumors maintained gene signature changes that could be used as identifiers. Ongoing studies are under way to determine whether any of these signature gene clusters can be used to help define subsets of human tumors based on human array datasets.
Recent unpublished studies in our laboratory have explored the expression profiles of two different models in which the p53 tumor suppressor gene has been disrupted: one through transplantation of mammary glands from p53 KO mice into wild-type mice; and the second through the conditional deletion of the p53 gene in the mammary glands. Another model generates mammary tumors through the conditional KO of BRCA1 in the context of one mutant p53 allele. Array analyses were performed on all of the previously studied transgenic models as well as the KO models. Hierarchical clustering using a subset of genes that were identified by F-test ANOVA with at least a 2-fold change in gene expression has begun to demonstrate interesting clusters of genes that may further distinguish myc, ras, HER2/neu, PyMT, Tag, Rb, BRCA1, and p53 pathways. In particular, a small set of BRCA1-specific genes is apparent. A larger number of genes are altered in both the Tag and p53 models, presumably due to the functional loss of p53 by Tag or the absence of p53 in the KO models (Fig. 2)<$REFLINK> . The results of these studies are currently being validated, but the studies demonstrate how the accumulation of array data from multiple models can help define pathway-specific changes.
The studies to date have focused on end-stage tumors where maximal genomic rearrangements and altered gene expression have occurred. Further pathway-specific changes may likely be uncovered by examining changes that occur during earlier progressive stages of tumor development. Because the mammary lesions in the C3(1)/Tag model tend to occur synchronously as the mice age, using this model one may predictably examine atypical ducal hyperplasias and ductal carcinoma in situ lesions (designated low- or high-grade mammary intraepithelial neoplasia, respectively), invasive carcinomas, or metastatic lesions (Fig. 3)<$REFLINK> . To identify changes that occur at these different stages of tumorigenesis, we have used laser capture microdissection to collect purified populations of epithelial cells from these lesions. This powerful technology allows one to attach cells of interest to a special membrane using laser energy and then extract nucleic acids or proteins from the cells. To apply this technology, the small quantities of purified RNA must be amplified to perform microarray studies. To date, these ongoing studies have revealed that the great majority of changes in gene expression occur at very early stages and that the number of changes observed in the transition from preinvasive to invasive lesions is relatively small. We are currently examining gene expression profiles at all stages of mammary cancer progression in the C3(1)/Tag model in both epithelial and stromal components using laser capture microdissection and microarray technologies.
Microarray Analysis of Therapeutic Response.
Recently, our laboratory has begun to explore whether microarray analyses can reveal new insights into mechanisms of drug action. This approach may be particularly useful in establishing modes of action for chemopreventive compounds whose pathways remain poorly or incompletely understood. It is likely that such studies will help identify potentially new targets for anticancer drug therapies. When C3(1)/Tag mice were treated with the cyclooxygenase-2 inhibitor celecoxib, a delay in mammary tumor incidence and multiplicity was observed in association with an induced cluster of genes (Kavanaugh et al., unpublished data). Several of these genes are related to growth-regulatory pathways and are being studied for their role in mediating cyclooxygenase-2-independent pathways. Similar studies using other classes of compounds are being performed to associate expression changes in vivo with antitumor biological effects.
Several important studies have recently been published that examine expression profiles of breast tumors from various patient cohorts. In one study, a set of genes has been identified that distinguishes breast tumors from patients with BRCA1 mutations from those with BRCA2 mutations, and these tumor signatures are different from those found for sporadic breast cancer (21) . Other investigators have identified a set of approximately 500 genes to classify patients into five subgroups with correlations to clinical outcome (22 , 23) . Another study using an artificial neural network approach to study 78 tumors found a subset of about 70 genes that were highly predictive of patients with a “good” or “poor” prognosis (24) . Obviously, a comparison of microarray data from human breast tumors with that of mouse mammary tumors is critical to determine the molecular similarities and differences in oncogenesis between the human disease and the representative mouse models.
However, there have been several technical impediments to this, including relatively few homologous genes represented on both mouse and human arrays; imperfections in public databases providing accurate mapping between mouse and human genes, especially genes represented by expressed sequence tags; and the lack of a common reference RNA that would allow for direct comparisons between the species. The cellular composition of normal mouse and human mammary glands is quite different, and therefore these tissues cannot readily be used as references. To attempt to overcome this problem, we are exploring new experimental and informatics approaches to make accurate comparisons of gene expression between mouse and human tumors (Fig. 4)<$REFLINK> . This information is important for identification of similar oncogenic pathways and potential therapeutic targets in the mouse models that are directly relevant to human cancer. The appropriate application of a particular mouse model to specific experimental questions will be significantly improved with this global molecular information.
The new era of genomic technologies is leading to tremendous insights into the complex global system of changes that are critical to the evolution and maintenance of cancer. Oncogenesis must be considered not just as rearrangements of chromosomes but also as a rearrangement of how regulatory networks interact and connect to each other. Cancer is not a disease of a few genes but of clusters of genes, whose altered global interactions evolve into the transformed and malignant state. Identifying and understanding how these networks form is the new challenge of oncology. Comparing such changes in model systems with those that occur in human breast cancer should lead to more precise definition of the most significant changes and should also help identify how to best use the ever-increasing number of mammary cancer models.
Dr. Carlos Arteaga: It’s interesting that you see some differences between polyoma virus middle T antigen mice and the ras and neu transgenics, in light of the fact that middle T uses some of the same signaling programs and with even greater intensity. Could that be because you were looking at late tumors? If you did the analysis with early mammary microdissected cells, would you be able to see more similarities? What is your interpretation of the fact that you do not seem to have the same degree of gain of function in the middle T mice?
Dr. Green: It may be that the polyoma middle T activates a more narrow pathway than some of the other oncogenes, but what it does activate is so oncogenic that you get this very aggressive phenotype. The polyoma clusters very closely with the ras and the HER2, which I think does fit with a common pathway. However, unlike the other models, a majority of the genes seem to be down-regulated, which we don’t quite understand.
Dr. Robert Nicholson: As I understand it, if you introduce an oncogene into the cells, eventually you get maybe 20 cancers, which you then array. On the arrays, how much variability do you see between the 20 cancers?
Dr. Green: There actually was relatively little; that was one of our initial concerns. In all of the models there is some variability in terms of the histopathology, but, in general, most of the tumors look fairly similar. Some have a bigger spectrum of sarcomatous transition and epidermal to mesenchymal transition than others. We tried to be sure that we were using tumors that looked histologically similar. When we did 5–10 tumors initially from the same kind of model, we found that the degree of correlation was quite high. So we were satisfied that, unlike the human situation, the tumor models tended to fall into specific groups.
Dr. Rachel Schiff: You are looking into markers associated with metastasis or with invasiveness, but maybe a more critical question is what set of genes or markers might distinguish between therapeutically good and bad ductal carcinoma in situ, good versus bad invasive disease. I wonder whether you can use your models to ask this question; if, for example, you go across models and look into more versus less aggressive phenotypes, then maybe you can find a set of genes that would be relevant to distinguish the early lesion by phenotype, when the clinician really needs to make a decision about what to do next.
Dr. Green: That is a goal we would like to achieve. The problem is that for many of the models, the temporal course of progression seems stochastic, and it is not easy to find the very early lesion in these animals. That is why we focused on this T antigen model, where one can do that, but it may be that the major changes are occurring very early, at the atypical hyperplastic stage. I’m not sure whether full progression in the T antigen model is determined very early in the course of lesion development or whether a limited set of changes occurs at a later stage. It may also be dependent upon when initial lesion began and how much time it has had to progress.
Dr. Schiff: Perhaps you could combine that study with chemoprevention approaches, then stop some lesions early and look for the changes.
Dr. Green: We have done several studies looking at the different compounds, and we have had success in preventing what we think is the continued transition from early lesions to the large palpable tumors, but in this model it is not 100% effective. Another approach might be to use models that have multiple genetic alterations, so that you start with a model that only reaches a particular stage but then becomes invasive when you add to it a second or third genetic change. That might allow one to look at whether certain chemopreventive agents can overcome some of those critical changes.
Dr. Richard Santen: If you take the Burt Vogelstein model of carcinogenesis, where there is an ordered sequence of events, your methodology of cDNA/RNA would really be ideal to pick up that type of ordered sequence. If you believe that the process is predominately stochastic—let’s say there is a mutation of a DNA repair gene—you will end up with multiple abnormalities that will be different in every animal. How can you distinguish between the stochastic model and the ordered model without having a better grip on what all of the various stochastic events are, where you may have difficulties showing those with the cDNA array?
Dr. Green: The hope is to try to go back to the earlier stages in at least a couple of the models and see if one can distinguish that, but it would be more powerful to couple it with the bacterial artificial chromosome arrays, for instance, to look for an ordered change in the genome amplifications and deletions, especially those syntenic to ones that seem to be critical for progression in human disease. Are there certain ones, and I think there are, that are critical in progression of some of the mouse models? That may help give some of the best clues as to what are some of the critical stochastic changes, and that is one approach we will continue to pursue.
Presented at the Third International Conference on Recent Advances and Future Directions in Endocrine Manipulation of Breast Cancer, July 21–22, 2003, Cambridge, MA.
Requests for reprints: Jeffrey E. Green, Transgenic Oncogenesis Group, Laboratory of Cell Regulation and Carcinogenesis, National Cancer Institute, Building 41, Room C629, 41 Medlars Drive, Bethesda, Maryland 20892. Fax: (301) 496-8395; E-mail:
↵1 E. Lee, unpublished data.