Purpose: Advanced-stage epithelial ovarian cancer has a poor prognosis with long-term survival in less than 30% of patients. When the disease is detected in stage I, more than 90% of patients can be cured by conventional therapy. Screening for early-stage disease with individual serum tumor markers, such as CA125, is limited by the fact that no single marker is up-regulated and shed in adequate amounts by all ovarian cancers. Consequently, use of multiple markers in combination might detect a larger fraction of early-stage ovarian cancers.
Experimental Design: To identify potential candidates for novel markers, we have used Affymetrix human genome arrays (U95 series) to analyze differences in gene expression of 41,441 known genes and expressed sequence tags between five pools of normal ovarian surface epithelial cells (OSE) and 42 epithelial ovarian cancers of different stages, grades, and histotypes. Recursive descent partition analysis (RDPA) was performed with 102 probe sets representing 86 genes that were up-regulated at least 3-fold in epithelial ovarian cancers when compared with normal OSE. In addition, a panel of 11 genes known to encode potential tumor markers [mucin 1, transmembrane (MUC1), mucin 16 (CA125), mesothelin, WAP four-disulfide core domain 2 (HE4), kallikrein 6, kallikrein 10, matrix metalloproteinase 2, prostasin, osteopontin, tetranectin, and inhibin] were similarly analyzed.
Results: The 3-fold up-regulated genes were examined and four genes [Notch homologue 3 (NOTCH3), E2F transcription factor 3 (E2F3), GTPase activating protein (RACGAP1), and hematological and neurological expressed 1 (HN1)] distinguished all tumor samples from normal OSE. The 3-fold up-regulated genes were analyzed using RDPA, and the combination of elevated claudin 3 (CLDN3) and elevated vascular endothelial growth factor (VEGF) distinguished the cancers from normal OSE. The 11 known markers were analyzed using RDPA, and a combination of HE4, CA125, and MUC1 expression could distinguish tumor from normal specimens. Expression at the mRNA level in the candidate markers was examined via semiquantitative reverse transcription-PCR and was found to correlate well with the array data. Immunohistochemistry was performed to identify expression of the genes at the protein level in 158 ovarian cancers of different histotypes. A combination of CLDN3, CA125, and MUC1 stained 157 (99.4%) of 158 cancers, and all of the tumors were detected with a combination of CLDN3, CA125, MUC1, and VEGF.
Conclusions: Our data are consistent with the possibility that a limited number of markers in combination might identify >99% of epithelial ovarian cancers despite the heterogeneity of the disease.
Ovarian epithelial carcinoma claims more lives than any other gynecological cancer in industrialized countries (1) . It is the fifth most common cancer in American women and the fifth most common cause of cancer death (2) . Whereas the 5-year survival for women presenting with early-stage disease is ∼90%, the majority of women (75%) are diagnosed with late-stage disease (stage III or stage IV) and have a 5-year survival of less than 30% (3) . Mortality might be reduced if the disease were detected in the early stages.
Screening for early-stage disease with individual serum tumor markers, such as CA125, is limited by the fact that no single marker is up-regulated and shed in adequate amounts by all ovarian cancers. Although CA125 represents the best available serum marker, achieving 50% sensitivity and 99% specificity for early-stage disease, it detects only ∼80% of all ovarian cancers (4) . In fact, epithelial ovarian cancer is a heterogeneous disease. Histological subtypes of epithelial ovarian cancer, including serous, endometrioid, mucinous, and clear cell carcinomas are known to have different clinical characteristics as well as different molecular features (4 , 5) . For example, many mucinous tumors do not secrete high levels of CA125 (6) . Therefore, a panel of complementary tumor markers will be required to detect all cases of ovarian cancer at an early stage. If screening is to be performed with individual assays, a limited number of markers must encompass the heterogeneity of the disease.
To identify potential candidates for novel markers, we assembled a group of 42 ovarian cancers of different histological subtypes and compared their gene expression to that in five pools of normal ovarian epithelial tissue scrapings using Affymetrix arrays. Genes were sought whose level of expression in all cancers exceeded the level of expression in normal epithelial scrapings. We then used recursive descent partition analysis (RDPA) to seek up-regulated genes that would distinguish different histotypes of ovarian cancer from each other and from normal ovarian epithelial tissue. Similar analysis was undertaken with a panel of 11 genes encoding previously reported tumor markers: mucin 1, transmembrane (MUC1), mucin 16 (CA125), mesothelin, WAP four-disulfide core domain 2 (HE4), kallikrein 6, kallikrein 10, matrix metalloproteinase 2, prostasin, osteopontin, tetranectin, and inhibin. Each of these genes has been shown to be up-regulated in ovarian cancers. Individually, these genes have not been shown to have sufficient specificity to function as a tumor marker for ovarian cancer. We were interested in whether combinations of these genes would increase overall detection.
MATERIALS AND METHODS
Tumor Samples and RNA Preparation.
Forty-two flash-frozen primary ovarian cancers were obtained from University of Texas M. D. Anderson Cancer Center (Houston, TX), Duke University (Durham, NC), and the Mayo Clinic (Rochester, MN; Table 1⇓ ). All of the tumors were classified according to grade and stage using standard FIGO (Federation International of Gynecology and Obstetrics) criteria. Five pools of normal ovarian epithelial brushings from 42 different individuals were obtained from Northwestern University. Patients ranged in age from 43 to 79 years with a median age of 55; two-thirds were postmenopausal and one-third, premenopausal. Of the postmenopausal donors, ∼60% were on hormone replacement therapy. The normal cells were collected using a cytobrush and were immediately suspended and frozen in RLT buffer (Qiagen, Valencia, CA). Total RNA was extracted from all of the ovarian cancers and normal ovarian epithelial scrapings using the RNeasy Kit (Qiagen) according to the manufacturer’s protocol. Institutional Review Board approval had been obtained at each participating institution prior to the initiation of this study.
Gene Expression Analysis.
The Affymetrix GeneChip Human Genome U95 set of oligoarrays (Affymetrix, Santa Clara, CA) was used to obtain gene expression data. This series tests the expression of more than 41,441 human genes and expressed sequence tags. The biotinylated cRNA preparation, hybridization, and scanning of the microarrays were performed according to the manufacturer’s protocols. Data were collected using GeneChip software (Affymetrix). Data were analyzed using the software program dChip (7) . We used Version 1.2 with the PM-only model to estimate differences. The genes listed had to pass two filters: the fold change for each gene between the normal ovarian epithelium scrapings and the ovarian cancers had to exceed 3, and the absolute difference in expression levels between the two means had to exceed 100 (to avoid distraction at the noise level, we found that on average 58% of the 41,441 genes and expressed sequence tags exhibited values greater than 100).
RDPA was performed on the 102 most up-regulated probe sets, representing 86 genes, to find combinations of genes whose expression distinguished the samples based on histology. The analysis was performed using the JMP software (SAS Inc., Cary, NC). The values used for the test were the normalized hybridization intensities obtained from dChip. In RDPA (8 , 9 , 10) , a classification tree is constructed that gives decision rules for assigning a sample to a category based on a series of sequential decisions. At each stage, a single predictor is used, and, depending on whether the value of the predictor is greater than or less than a selected cutoff value, the sample is assigned to a left or right node. Each of the resulting nodes is then analyzed using the same procedure, although different predictor variables and cutoff values may be used. The cutoff value is selected to maximize the likelihood-ratio χ2 statistic for the test that the probability of a particular case belonging to a given group is independent of whether the predictor of that case is above or below the cutoff value. Thus, the cutoff value is chosen to make cases above and below the cutoff value as different as possible with respect to classification. This procedure continues until the data in each node are sufficiently well discriminated or until there are too little data in any node to support further analysis.
Semiquantitative reverse transcription (RT)-PCR was performed using the Bioanalyzer 2100 (Agilent Technologies, Germantown, MD). The DNA 500 LabChip kit (Agilent Technologies) was used to determine the expression levels of Notch homologue 3 (NOTCH3), E2F transcription factor 3 (E2F3), GTPase activating protein (RACGAP1), hematological and neurological expressed 1 (HN1), CA125, HE4, CLDN3, MUC1, and vascular endothelial growth factor (VEGF). All of the chips were prepared as instructed by the manufacturer. Data were normalized using glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as an internal control. Negative control reactions without reverse transcriptase were used to identify genomic contamination. The linear range of each primer set was first determined. The final conditions for NOTCH3, E2F3, RACGAP1, and HN1 were 30 cycles and a 55°C annealing temperature, whereas the cycle number for the five remaining genes was 35. The expression level of each gene was determined for individual tumors and compared with the average expression level in the five normal pools.
Tissue Microarrays and Immunohistochemistry.
Tissue microarrays were constructed using 158 epithelial ovarian cancer tissues. Included in the tissue microarrays were 113 serous, 23 endometrioid, 12 clear cell, and 9 mucinous ovarian carcinomas (OVCAs) with 5, 16, 7, and 7 early-stage samples, respectively. Immunohistochemistry was performed on the arrays as well as 5 normal ovarian epithelial tissues using antibodies against CA125, HE4, CLDN3, MUC1, and VEGF. Staining for CA125 was performed according to the manufacturer’s protocol using 2 μg/ml of the OC125 antibody (Dako, Carpinteria, CA), an antigen retrieval of microwave for 10 min, and a primary antibody incubation of 1 h at room temperature. HE4 antibody was prepared at the Pacific Northwest Research Institute, Seattle, WA (11) . The staining was carried out with an antigen retrieval of microwave for 5 min, a primary antibody concentration of 10 μg/ml αHE4:2H5 antibody, and a primary antibody incubation of 4°C overnight. Commercially available antibodies for CLDN3 (Polyclonal Rabbit anti-Claudin 3, Zymed Laboratories Inc., South San Fransisco, CA), MUC1 (Muc-1 Core Glycoprotein monoclonal antibody, Novocastra Laboratories Ltd., Newcastle upon Tyne, United Kingdom), and VEGF [Vascular Endothelial Growth Factor Ab-3 (JH121), NeoMarkers, Inc., Fremont, CA] were used at a 1:50 dilution and stained according to manufacturers’ protocols. With CLDN3, antigen retrieval was performed by boiling for 15 min, and the primary incubation was 1 h at room temperature. The antigens for MUC1 and VEGF were retrieved via pressure cooker and microwave for 10 min, respectively. The antibody incubations were performed at room temperature for 1 h (MUC1) or 2 h (VEGF). A 10% cutoff was used to determine staining in a given sample.
Expression Array Analysis and RDPA by Normal or Cancer Status.
From the gene expression profiles of 42 ovarian cancer tumor tissue specimens and 5 pools of normal ovarian epithelium scrapings, we identified 86 genes that were up-regulated 3-fold or greater and reported absolute differences in expression levels exceeding 100 (Table 2)⇓ ⇓ . To look for the most robust genes, we applied a third filter requiring the lower bound of a 90% confidence interval for the fold change to exceed 3 (7) . Genes passing all three filters are annotated in Table 2⇓ ⇓ . Using RDPA on the 86 genes, considering only tumor and normal as the classification criteria, we found four genes that perfectly separated the tumors from normal: E2F3, HN1, NOTCH3, and RACGAP1. The scatterplots of the microarray data in Fig. 1⇓ show the perfect separation between normal and tumor for the four genes. A fairly wide range was noted in the amount of up-regulation between different cancers.
RDPA by Histotype.
The 86 genes that were up-regulated 3-fold were evaluated for their ability to distinguish different histotypes from each other and from normal ovarian epithelial scrapings using RDPA. Two genes in combination, CLDN3 and VEGF, achieved a complete separation of the ovarian tumors from normal (Fig. 2A)⇓ . Elevated CLDN3 identified all serous, endometrioid, and clear cell cancers. Elevated VEGF distinguished mucinous cancers from normal ovarian surface epithelium.
Subsequently, we examined the ability of genes for 11 previously known markers to distinguish different histotypes from each other and from normal ovarian epithelial scrapings. Among the 42 cancers, the number of tumors with a 2-fold elevation in each marker ranged from one for prostasin and inhibin to 37 for MUC1 (Table 3)⇓ . When all 11 markers were elevated, RDPA was able to discriminate normal tissue from tumor using HE4, CA125, and MUC1 (Fig. 2B)⇓ . Elevated HE4 separated 100% of the serous tumors, 89% of the endometrioid tumors, 43% of the clear cell tumors, and 22% of the mucinous cancers from the normals. An elevated CA125 separated the remaining clear cell tumors from the normals. MUC1 distinguished 100% of the clear cell tumors from the normals. Of note, 78% of the mucinous cancers tumors and 11% of the endometrioid tumors had a low CA125. Scatterplots of microarray data for the five genes identified by RDPA (CLDN3, VEGF, HE4, CA125, and MUC1) are shown in Fig. 3⇓ .
Overall, nine markers were identified by inspection of the initial scatterplots (Fig. 1)⇓ and by RDPA (Fig. 2)⇓ . To confirm up-regulation of expression, semiquantitative RT-PCR was performed with 20 samples that included 5 cases from each of the four histotypes. In general, RT-PCR data correlated well with array data.
Immunohistochemical Reactivity of Five Markers.
Antibodies could be obtained for five of the nine candidate markers. Immunohistochemical analysis of CLDN3, VEGF, HE4, CA125, and MUC1 in 158 ovarian cancers is shown in Fig. 4⇓ and Table 4⇓ . A combination of CLDN3 and MUC1 detected 155 (98%) of 158 cases, a combination of CLDN3, CA125, and MUC1 included 157 (99.4%) of 158 cases, and all tumors were detected with a combination of CLDN3, CA125, MUC1, and VEGF.
The use of gene expression microarrays permitted us to identify those genes that are highly overexpressed in ovarian cancers when compared with normal ovarian surface epithelium. Because the goal of our study was to identify potential markers for the early detection of epithelial ovarian cancer, we specifically chose a set of samples that would represent the histological heterogeneity of the disease. We included tumors of serous, mucinous, clear cell, and endometrioid histotypes. In addition, we compared these tumor samples with pooled scrapings of normal ovarian surface epithelium, rather than to whole ovaries (12) , short-term cultures (13 , 14) , or immortalized ovarian surface epithelium (15) .
We identified 86 genes that were at least 3-fold up-regulated in cancers when compared with normal epithelial scrapings. We also found HE4 (WAP four-disulfide core domain 2; Refs. 12, 13, 14 ), CD24 antigen (12) , ceruloplasmin (14) , claudin 3 (14) , claudin 4 (14) , enolase 1 α (12) , eyes absent homologue 2 (16) , karyopherin α2 (12) , mammaglobin 2 (16) , mucin 1 transmembrane (12, 13, 14) , preferentially expressed antigen in melanoma (17) , and tumor-associated calcium signal transducer 1 (12) to be highly expressed in ovarian tumors as compared with normal ovarian epithelium. The concordance of results by several groups of investigators is reassuring.
Four genes (NOTCH3, E2F3, RACGAP1, and HN1) separated all 42 cancers from the five pools of normal ovarian epithelial cells with mean fold changes of 3.6 (lower bound = 2.9), 3.1 (lower bound = 2.3), 4.6 (lower bound = 3.7), and 3.8 (lower bound = 3.0), respectively. RACGAP1 has been shown to be down-regulated by estrogen (18) , providing a plausible connection to the estrogen deficiency associated with postmenopausal women and ovarian cancer. Semiquantitative RT-PCR of these genes performed on the same samples confirmed the elevated levels of expression detected on the microarrays. When the individual expression values in the tumor and normal samples were examined for each gene, as seen in Fig. 1⇓ , the separation between them is small. Therefore, although a perfect separation was achieved with each of these four genes, the distance between tumor and normal may not be sufficiently great for clinical use. Whereas the mean fold change is significant, individual tumor values may overlap with normal values resulting in increased sensitivity but decreased specificity. Use of multiple markers in combination might increase specificity if the markers could be captured using mathematical algorithms that optimize specificity. Furthermore, whereas the markers were selected for an ability to distinguish ovarian cancer from normal ovarian epithelium, the specificity may be decreased by the expression of these genes in other normal tissues or cancer cell lineages.
RDPA was applied to the microarray data to identify combinations of genes that would classify ovarian cancers of all histologies. The combination of two genes, CLDN3 (fold change = 6.3, lower bound = 4.3) and VEGF (fold change = 5.2, lower bound = 3.8), distinguished all of the ovarian tumors from normal surface epithelium. Semiquantitative RT-PCR of CLDN3 and VEGF performed on the same samples confirmed the elevated expression levels found on the microarrays. There was a wide separation of CLDN3 values in serous, endometrioid, and clear cell tumors as compared with normal. When immunohistochemical staining was performed on a larger set of tumors, elevated protein expression of CLDN3 and VEGF were again confirmed. CLDN3 is a member of a large family of integral membrane proteins important for tight junction formation and function. CLDN3 and CLDN4 have been shown to be highly expressed in ovarian cancer as well as other epithelial cancers. A recent report has demonstrated increased cytoplasmic staining of CLDN3 in ovarian cancers, suggesting a possible role as a signaling molecule in addition to its role regulating tight junction permeability (19) . At present, we are determining whether significant amounts of CLDN3 are shed into the sera.
We also performed RDPA on a set of 11 previously reported potential tumor markers to identify a set of markers that would identify all histologies of ovarian cancer from normal. HE4 (FC = 3.9, LB = 3.0), CA125 (FC = 2.2, LB = 1.5), and MUC1 (FC = 5.1, LB = 3.9) in combination separated the tumors from normal. From Table 4⇓ , values of HE4 in serous tumors and endometrioid tumors show a significant separation from normal. Whereas RDPA was able to separate each of the histologies of ovarian cancers from normal, it was interesting to note that the mucinous tumors should be separated from other histologies because of low CA125 values; this was also confirmed by immunohistochemistry and semiquantitative RT-PCR. This is consistent with previous reports that indicate that CA125 is poorly expressed by many mucinous cancers. Immunohistochemical staining of ovarian cancer samples using the HE4 antibody was suboptimal, with only 36% of serous tumors and 50% of endometrioid tumors showing positive HE4 staining. This may relate to loss of antigenic activity during fixation and embedding, as Hellstrom et al. recently reported a serum assay for HE4 using this same antibody to detect antigen in sera from women with ovarian cancer. Using sera from 37 ovarian cancer patients, they found that HE4 demonstrated similar specificity and sensitivity to CA-125, with fewer false positive values for nonmalignant ovarian disease (11) . Recent studies have also identified HE4 to be overexpressed in some breast tumors (20) . HE4 is an anti-proteinase that was initially identified in the male epididymis (21) . The highly elevated expression levels seen in serous and endometrioid ovarian tumors in our study and other studies (13 , 14) , as well as the relative specificity to ovarian cancer, makes HE4 a leading candidate for a tumor marker.
One critical requirement for an effective tumor marker is its presence in early-stage disease. One of the main shortcomings of CA125 is that 50% of stage 1 ovarian cancers do not have an elevated CA125. Our previous study has shown that a subset of genes that are overexpressed in late-stage serous cancers are also overexpressed in early-stage serous cancers (22) . In our present study, we included nine early-stage serous cancers and eight late-stage serous cancers. According to microarray data, all of the candidate markers except CA125 showed elevated expression levels in both early- and late-stage serous tumors.
Ultimately, serum validation of these markers in women with early-stage disease will be necessary to determine optimal marker candidates. It is not known whether the genes identified in the microarray analysis are released into the circulation, which is a limitation of our study. In addition, presence of protein in ovarian cancer, as detected by immunohistochemistry, may not be accompanied by presence of the protein in the circulation. Genes that may not show striking overexpression of transcript may have significant serum levels. Using too stringent microarray criteria for choosing candidate genes may exclude these candidates. For example, whereas mesothelin showed low expression values in our study, Scholler et al. (23) have shown that mesothelin can be detected in serum of ovarian cancer patients. Finally, the markers presented here may identify genes overexpressed in solid tumors and may not be ovarian cancer-specific. We believe our study is a first step toward identifying new and complementary markers for ovarian cancer. Clearly, further work for individual genes will be necessary.
In conclusion, we believe that it is unlikely that a single marker for epithelial ovarian cancer will be clinically useful, given the biological heterogeneity of the disease. Our data, however, suggest that a limited number of markers in combination might identify all ovarian cancers. However, specificity may be decreased by the expression of these genes in other normal tissues or cancer cell lineages. For effective screening with serum tumor markers, expression of protein must also be accompanied by appropriate release into the circulation. Work is currently on-going to develop assays with the candidate genes and combinations identified in this study.
Grant support: Supported by Ovarian SPORE Grant CA 83639, NIH, Department of Health and Human Services, and the CORE Grant CA 16772–28; K. Lu was supported by American Association of Obstetricians and Gynecologists Foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: K. Lu and A. Patterson contributed equally to this report.
Requests for reprints: Robert C. Bast, Jr., Department of Experimental Therapeutics, Box 355, University of Texas M. D. Anderson Cancer Center, 1550 Holcombe Boulevard, Houston, TX 77030. Phone: (713) 792-7743; Fax: (713) 742-7864; E-mail:
- Received October 15, 2003.
- Revision received January 30, 2004.
- Accepted February 10, 2004.