
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Imaging, Diagnosis, Prognosis |
Authors' Affiliations: 1 Department of Gynecology and Obstetrics, University Hospital Schleswig-Holstein, Kiel, Germany; 2 Genomics Institute of the Novartis Research Foundation, San Diego, California; 3 Department of Pathology, University of Virginia Health Sciences System, Charlottesville, Virginia; 4 Department of Gynecology and Obstetrics, University Hospital Charité, Berlin, Germany; 5 Department of Gynecology and Obstetrics, University Hospital, Freiburg, Germany; and 6 Lilly Deutschland, Bad Homburg, Germany
Requests for reprints: Ivo Meinhold-Heerlein, Department of Gynecology and Obstetrics, University of Schleswig-Holstein, Kiel, Germany. E-mail: imeinhold{at}email.uni-kiel.de.
| Abstract |
|---|
|
|
|---|
Experimental Design: We developed an algorithm to identify secreted proteins encoded among
22,500 genes on commercial oligonucleotide arrays and applied it to gene expression profiles of 67 stage I to IV serous papillary carcinomas and 9 crudely enriched normal ovarian tissues, to identify putative diagnostic markers. ELISAs were used to validate increased levels of secreted proteins in patient sera encoded by genes with differentially high expression.
Results: We identified 275 genes predicted to encode secreted proteins with increased/decreased expression in ovarian cancers (<0.5- or >2-fold, P < 0.001). The serum levels of four of these proteins (matrix metalloproteinase-7, osteopontin, secretory leukoprotease inhibitor, and kallikrein 10) were significantly elevated in a series of 67 independent patients with serous ovarian carcinomas compared with 67 healthy controls (P < 0.001, Wilcoxon rank sum test). Optimized support vector machine classifiers with as few as two of these markers (osteopontin or kallikrein 10/matrix metalloproteinase-7) in combination with CA-125 yielded sensitivity and specificity values ranging from 96% to 98.7% and 99.7% to 100%, respectively, with the ability to discern early-stage disease from normal, healthy controls.
Conclusions: Our data suggest that this assay combination warrants further investigation as a multi-analyte diagnostic test for serous ovarian adenocarcinoma.
75% of patients will succumb to disease progression. The high ratio of death to incidence for patients with ovarian carcinoma is largely due to late-stage diagnosis, at a time when the disease has typically spread into the peritoneum beyond the pelvic region. The 5-year survival rate for patients with International Federation of Gynecology and Obstetrics stage I disease, which is confined to the ovary, is >90% (1, 2). However, survival over the same period of time for patients diagnosed with International Federation of Gynecology and Obstetrics stage III or IV disease is <20%. Late-stage diagnosis of epithelial ovarian cancer can be attributed to the fact that the disease is relatively "asymptomatic" in its early stages, and that the symptoms of late-stage disease, such as abdominal discomfort, weight loss, diarrhea or constipation, vaginal bleeding, and shortness of breath, are nonspecific complaints. The relatively nonspecific nature of these symptoms underscores the need for disease diagnostics (1, 2).
The discovery of biomarkers for the detection of ovarian cancer, as in other cancers, has traditionally been driven by the serendipitous discovery of elevated tumor-associated antigen/protein levels in patients with cancer compared with healthy controls. Antibodies recognizing several tumor-associated antigens, notably OC-125, which binds the CA-125/MUC-16 protein (3, 4), have been successfully used to aid in the diagnosis of ovarian cancer, in combination with other noninvasive techniques, such as transabdominal ultrasonography or transvaginal sonography (reviewed in ref. 5). As cancer of the ovary occurs in
1 of 2,500 postmenopausal women in the United States, screening methodologies must attain at least 75% sensitivity and >99.6% specificity to yield a recommended positive predictive value (PPV) of 10% (where 1 of 10 surgical interventions would lead to the diagnosis of one ovarian cancer). Although trials of transvaginal sonography screening are at the margin of acceptable positive predictive value (9.9%), the use of transabdominal ultrasonography on patients with serum concentrations of CA-125 >30 units/mL, increases PPV to 21%. The diagnostic usefulness of CA-125, and perhaps of other serum markers, may be significantly enhanced by serial measurements taken over time. For example, a longitudinal analysis of serum from postmenopausal women in Stockholm using linear regression increased the PPV to 16%, substantially greater than the PPV of a single CA-125 assay (reviewed in ref. 5). However, these assays, or combinations of assays, are still insufficient for broad clinical acceptance and population-based screening.
The advent of large-scale gene expression analysis of tumors has generated a substantial number of potential new markers, several of which have been validated in small test populations. These include prostasin (6), osteopontin (OPN; refs. 7, 8), and the whey acidic protein HE4 (9). A recent review tabulated 29 different serum markers, including many of these newly described candidates (5). Although no single assay seems to provide the necessary PPV, many of these markers are reported to increase the specificity and sensitivity of CA-125 (5). Moreover, the conjoint analysis of multiple protein assays, such as the combination of leptin, prolactin, OPN, and the insulin-like growth factor-2, have shown encouraging results in training/test studies, yielding a test specificity and sensitivity of 95% (10). Thus, the discovery of additional biomarkers and methodologies for combined analyses hold promise for the diagnosis of early-stage disease.
Here, we developed an algorithm to identify secreted proteins encoded by genes represented on the
22,000-member Affymetrix HU133a array, reasoning that genes with elevated expression in cancers may reflect increased levels of the encoded proteins in patient sera. To test this hypothesis, the algorithm was applied to the expression profiles of 67 serous ovarian adenocarcinomas from 64 patients, alongside a series of nine crudely enriched normal ovarian samples. Among 275 differentially expressed genes encoding putative secreted proteins, we show that a combination of ELISA assays for four proteins [matrix metalloproteinase-7 (MMP-7), OPN, kallikrein 10 (KLK10), secretory leukoprotease inhibitor (SLPI)], in combination with CA-125, leads to very high positive and negative predictive values, with correspondingly high sensitivity and specificity.
| Materials and Methods |
|---|
|
|
|---|
RNA preparation and GeneChip hybridization. Frozen tissues were homogenized with a rotary homogenizer (Omni International, Marietta, GA) in RNeasy lysis buffer (Qiagen, Valencia, CA). Total RNA was prepared from tissues and cells using the RNeasy Mini kit (Qiagen). Hybridization on oligonucleotide microarrays (U133a GeneChip, Affymetrix, Inc., Santa Clara, CA) was done as described (11, 12). GeneChip hybridization data were processed and scaled as described (11, 12).
Data processing and prefiltering. Microarray data was processed using MAS5.0 (Affymetrix) and scaled to a nominal value of 200 (
3-5 copies of transcript per cell; refs. 13, 14) to facilitate inter-chip comparison. The primary data have been deposited and are accessible on the Gene Express Omnibus.8
Delineation of genes encoding putative secreted proteins. Several databases were used to identify genes encoding proteins with characteristics of secretion. First, we mapped U133a probe sets to REFSEQ protein identifiers via the NetAffix web site.9 Next, we mapped the Swissprot database to REFSEQ using Blastp, thus resulting in a common map for Swissprot annotations and U133a probe set identifiers. The subset of mapped proteins was queried for the "subcellular location," "secreted," and "extracellular" annotations. We then downloaded the Genome Ontology database (MySQL release and Perl library modules)10 and mapped them to the REFSEQ database. This subset of proteins were queried for the following level 3 and 4 terms: "extracellular space," "extracellular," "plasma protein," "cell-cell signaling," "extracellular matrix," "hormone activity," "extracellular matrix structural constituent," "cytokine activity," and "receptor binding." Proteins with the following Genome Ontology annotations were discarded from the list: "membrane fraction," "nucleus," "cytosol," "cytoplasm," "integral to membrane," "integral to plasma membrane," and "transmembrane receptor activity." Lastly, we queried REFSEQ proteins for the presence of signal peptides using signalP.11 The criteria set was that at least one of the metrics for a signal peptide sequence was called by the signalnn method and called secreted by the signalhmm method. The combination of these methods led to a subset of 1,441 genes with evidence for encoding secreted proteins by one or more methods.
Differential gene expression. Differentially expressed genes were defined and ranked using a previously described method (11), which optimizes the selection of genes with a combination of high fold-change and significant P.
Statistical analysis. Differences in serum protein levels between groups of patients was calculated using the Wilcoxon rank sum test implemented in R (version 2.1.0).12
Split point, logistic regression, and support vector machine analyses. The split-point scoring procedure was described previously by Mor et al. (10). The method constructs a simple vote classifier, where each marker contributes a vote by comparing the observation value to a split-point value. The optimal split-point value for each individual marker is determined by maximizing the accuracy of its corresponding binary classifier (essentially a single-node decision tree). Then, the optimal cutoff value for the vote classifier is determined based on cross-validation runs. Notice, in this method, that votes from each marker are counted equally.
Logistic regression analysis linearly combines measurements from each marker with individual assigned weights and then compares the resultant value to a split point. The implementation of logistic regression described by Smyth was done using MATLAB (Smyth GK., Statbox Toolbox for MATLAB).13
Support vector machine (SVM) analyses in this study were carried out using the LIBSVM implementation.14 Based on the authors' recommendation, we chose radial basis kernel, with both constrain variable C and kernel variable
optimized via an exhaustive grid searching protocol. Whenever class sizes in a training set were unequal, we used a weighting argument "w," to rebalance the data set. To apply the LIBSVM method to three classes, we used the one-against-one approach described by Hsu et al. (15).
ELISA. The sera of tumor patients were taken before surgery in the Department of Gynecology of the University Hospital Schleswig-Holstein/Campus Kiel and in the Department of Gynecology of the University Hospital Charité Berlin. Negative control samples came from age-matched healthy women taken in the Department of Gynecology of the University Hospital of Freiburg. Blood was spun at 1,500 x g for 5 min (room temperature), and the serum supernatant was carefully retained and directly frozen at 20°C. For quantification of the serum protein levels of OPN, MMP-7, SLPI, and KLK10, ELISAs were done using blood sera of 67 patients with stage I to IV serous ovarian carcinoma, 67 healthy women, and a small series of 15 serum samples from women with cystadenofibromas of the ovary. Measurements of OPN, MMP-7, and SLPI protein levels were carried out with the appropriate Quantikine Immunoassay kits according to the manufacturer's recommendations (R&D Systems; Wiesbaden-Nordenstadt, Germany). KLK10 protein levels were determined with the IBEX hk 10 mikrotiter ELISA (IBEX, Quebec, Canada). Before analyses, serum samples were diluted 1:10 for OPN, 1:4 for MMP-7, 1:20 for SLPI, and 1:1 for KLK10.
Serum CA-125 serum levels were determined automatically using the AxSYM CA-125 test on the Abbott AxSYM analyzer (Abbott Diagnostics, Abbott Park, IL). A value of >35 units/mL was considered elevated. All immunoassay kits were used according to the manufacturer's protocols, and each serum sample was analyzed in duplicate.
| Results |
|---|
|
|
|---|
22,500 human transcripts. The quality of the data set was assessed in two ways. First, agglomerative clustering of the most variably expressed genes was used to evaluate the extent to which normal and malignant tissues could be discerned (Fig. 1A
). With the exception of one normal sample (N35S) and a single grade I tumor, all of the normal tissues could be clearly delineated from malignant tissues. Among the serous carcinomas, a major distinction was based on tumor grade and malignant potential, an observation we recently reported (16). Second, we used a previously described algorithm (t test/FOLD metric), which optimizes selection of genes with the highest fold change and most uniform differential expression, to identify genes with increased or decreased expression in carcinomas relative to normal tissue (11). The genes identified included the whey acidic protein HE4 and the cell surface proteins GA733-1, CD24, CD9, and mesothelin, among others, which we and others have previously reported to be differentially expressed in ovarian carcinomas (reviewed in ref. 17). In aggregate, these results show the quality of the data in terms of previously reported observations and independent validation.
|
|
|
|
The concentration of each of these proteins, alongside CA-125, was initially measured in a series of 67 serum samples from patients with stage I to IV serous ovarian carcinoma (n = 7, 2, 49, and 9 with stage I, II, III, and IV disease, respectively) and compared with levels in a series of 67 serum samples from healthy controls. The levels of each protein in healthy and cancer patients were significantly different in each case (Fig. 2 ; Table 4 ). We then assessed the diagnostic performance of each individual marker using three classification methods: logistic regression, split-point analysis, and SVM, each computed with 1,000 runs of 10-fold cross-validation (Table 5 ). For single marker classifiers, all three methods yielded similar results. Because the logistic regression method is optimized against the log-odds of whether the dependent variable occurs or not (i.e., normal versus cancer in this case), and because the split-point method (see Materials and Methods) is directly optimized against the accuracy function, the latter tended to give the best single marker performance. However, when we combined all of the markers into a single classifier, the split-point method yielded the lowest accuracy (92.8%), which was lower than that obtained with OPN alone. The most likely explanation for this is that the method counts votes from each marker equally, despite the fact that the diagnostic power of each marker varies widely. In contrast, the logistic regression method assigns appropriate linear weights to individual markers and thus achieved a higher accuracy (98.0%). Finally, the radial kernelbased SVM classifier is a nonlinear method, which typically works better for multiple markers. For this reason, we chose to use SVM as the method of choice for further analyses. Results from logistic regression analysis and split-point analysis are presented as supplementary materials (Supplementary Tables S1 and S2).
|
|
|
|
Receiver operator characteristic (ROC) analysis was carried out to compare the performance of classifiers over a range of internal settings (Fig. 4 ). When a single marker was used in the classification model, both split-point and logistic regression methods make use of a single internal threshold value. We therefore obtained the ROC curve by taking every observation value in the sample set as a possible internal threshold. Thus, both methods shared the same set of ROC curves. ROC curves were only generated for logistic regression analysis when multiple markers were involved to preserve the clarity of the curves illustrated in Fig. 4. Sensitivity-specificity pairs were collected within each cross-validation run by varying the internal threshold. Results from all of the 1,000 runs were then averaged to yield the ROC curve. It is clear from the ROC curves that for a wide range of classifier configurations, performance of individual markers decreases in the order of OPN, CA-125, MMP 7, KLK10, and SLPI, using either split-point or logistic regression. Multiple markers perform significantly better than single markers, with little difference between four and five marker combinations. This observation is also consistent with the SVM results in Table 5.
|
To further assess the potential of these marker combinations for the diagnosis of ovarian carcinomas, we measured the concentrations of each protein in a small series of serum from patients diagnosed with benign cystadenofibromas (n = 15) and evaluated whether these could be distinguished from protein levels in sera from normal and malignant patients (Fig. 5 ). Although three of five of the biomarker serum levels were significantly different between normal and benign samples (Table 3; Fig. 4), only OPN and CA-125 had any apparent power to differentiate serum from patients with cystadenofibromas from those with carcinomas (Table 3; Fig. 4). We used a three-class SVM based on a "one-against-one" approach, where binary SVM classifiers are separately built and require two-third votes for class assignment. This resulted in non-weighted and weighted accuracies of 90.6% and 75.5% (5-fold cross-validation), respectively. However, examination of the cases contributing to the overall accuracy revealed that cancer and normal samples account for the majority of correct votes. In contrast, a significant number of benign cases were misclassified (data not shown).
|
| Discussion |
|---|
|
|
|---|
Here, we undertook a combined clinical-genomics approach toward the identification of new diagnostics, focusing on differentially expressed genes encoding putative secreted proteins. Validation of our approach comes from the identification of several proteins that exhibit increased levels in ovarian cancer sera [e.g., HE4 (9), prostatin (6), OPN (7, 8), several members of the kallikrein protein family (19), and insulin-like growth factor-2 (10)]. In this study, we focused on four genes encoding secreted proteins (MMP-7, SLPI, OPN, and KLK10), with the aim of assessing their potential usefulness as a multi-analyte diagnostic test.
Analysis of the serum data with several classification methods pointed to SVM as the most accurate for combinations of markers. SVM was sufficient to achieve >99% diagnostic accuracy with as few as three markers (CA-125, OPN, and MMP-7/KLK10) and increased the diagnostic accuracy of CA-125 (91%) by 7.5% with the addition of OPN (98.5%). Further modifications, using a penalty for false-positive classifications, yielded sensitivity and specificity values of 95.7% and 100%, respectively, with all five markers combined.
Although the numbers of patients with early-stage disease used in this preliminary study were low (seven stage I and two stage II), it is nonetheless noteworthy that our classifiers could readily distinguish these from control serum protein levels. However, the distinction between these early-stage cases and those with benign cystadenofibromas (n = 15) was not particularly good. However, detection of these lesions by the diagnostic combination described here may be advantageous because patients with this condition will normally undergo surgery for differential histologic diagnosis or to remove bulky benign disease.
Several studies have previously shown potential usefulness for OPN as diagnostic marker in ovarian cancer. In their original report, Mok et al. showed a specificity of 80.4% and early-stage (stage I/II) and late-stage (stage III/IV) sensitivities of 80.4%and 85.7%, respectively, noting significantly higher levels of serum OPN in ovarian carcinomas compared with other gynecologic malignancies and benign pelvic disease (7). Subsequently, Schorge et al. showed that OPN, while diagnostically inferior to CA-125 in predicting response to therapy, increased earlier in 90% of patients developing recurrent disease (20). In a study of 296 ovarian cancers, of which 65 were found to have weak or absent CA-125 immunoreactivity, OPN was immunoreactive in 60 of 65 cases, second to human KLK6 and KLK10 (98% and 100% positive, respectively; ref. 21). Thus, OPN seems to provide additional diagnostic information to CA-125. In the current study, OPN showed the highest sensitivity and specificity with 10-fold cross-validation (88% and 100%, respectively) and was notably different in benign and malignant cases, along with CA-125.
We found that SLPI done poorly in terms of differentiating serum protein levels in normal, benign, and malignant cases. This contrasts with a recent report in which the levels of protein in malignancy (mean = 63 and 67 ng/mL for stage I/II and III/IV disease, respectively) were almost twice those in benign (39 ng/mL) and normal patients (33 ng/mL; ref. 22). In the current study, the equivalent numbers were 45, 39, and 36 ng/mL.
KLK10 has been proposed as a potential marker for ovarian cancer, although the diagnostic sensitivity at 90% specificity is 54% (23). However, the combination of KLK10 with CA-125 increases the specificity by 21% at the same sensitivity. The same report also showed that KLK10 was elevated in 35% of CA-125negative cases at 90% specificity. KLK10 was independently reported as elevated in 100% of 65 CA-125negative cases (21); however, the latter measured KLK10 by immunohistochemistry, whereas the former was measured by ELISA at a defined specificity. Nonetheless, like OPN, these data point toward a complementary role for KLK10 in ovarian cancer diagnosis.
To our knowledge, MMP-7 has not been reported as elevated in the serum of ovarian cancer patients, although overexpression in ovarian carcinoma is well documented (24), and it has been shown to mechanistically influence ovarian cancer cell invasion (25). Our data suggest that MMP-7 is a useful adjunct to CA-125, with independent specificity and sensitivity of 79% and 82%, respectively (10-fold cross-validation). The combination of MMP-7, OPN, and CA-125 was shown to be sufficient for a decision tree model, leading to 96.3% diagnostic accuracy in the current study (10-fold cross-validation).
The algorithm used to identify secreted proteins yielded a significant number of candidate proteins for which assays are not yet available but may nonetheless prove diagnostically useful. In combination with other markers that have been recently described (e.g., HE4, insulin-like growth factor-2, prolactin, prostasin, and leptin; see ref. 5 for a recent review), a concerted effort to develop new assays, most likely based upon solid surface supports methodologies (e.g., protein or antibody arrays; ref. 26), is warranted. Large retrospective studies should be capable of identifying the subset of markers with the highest diagnostic accuracy and discrimination of benign and early-stage disease (stage I/II), which can then be validated in equally large multicenter prospective studies. With improving technologies, a multi-analyte test that incorporates 10 to 15 biomarkers should become financially viable and would have a profound effect on the current ovarian carcinoma mortality rates in the United States and elsewhere.
| Acknowledgments |
|---|
| Footnotes |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
I. Meinhold-Heerlein, D. Bauerschlag, and Y. Zhou contributed equally to this study.
Current address for G.M. Hampton: Celgene Corp., San Diego, CA.
8 http://www.ncbi.nlm.nih.gov/geoc/. ![]()
9 http://www.affymetrix.com/analysis/index.affx. ![]()
10 http://www.godatabase.org/dev/database/. ![]()
11 http://www.cbs.dtu.dk/services/SignalP/. ![]()
13 http://www.statsci.org/matlab/contents.html. ![]()
14 http://www.csie.ntu.edu.tw/~cjlin/libsvm. ![]()
Received 3/21/06; revised 11/ 3/06; accepted 11/ 8/06.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |