
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Perspectives |
Departments of Urology and Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York
Introduction
Numerous analyses of new markers are published routinely. Discovery of a new marker may be important for a variety of reasons. For example the new marker might yield insight on a disease process. However, the common application and analysis of a new marker is that of stratifying patients with respect to the outcome. The typical scenario is the collection of a dataset containing the new marker, established markers, and patient outcome. In this setting the scientist desires from the biostatistician an analysis of the empirical value of the new marker. This report addresses such an analysis. While there may be other value in dicovery of a particular marker, the central issue here is often prediction of outcome. When a new marker is identified the founder would like others to measure and use it because of the belief that it would be better able to predict patient outcome and, thereby, improve patient counseling and help patients make better treatment decisions (1) . However, the methods often used to evaluate and demonstrate the usefulness of a new marker are in need of improvement. In this article, the typical approach to the evaluation of a new marker is discussed, and an alternative is suggested. In the typical approach, the association of a new marker with established markers is examined, and the univariable and multivariable analyses of the new marker are performed. It is argued that a better approach is to compare the predictive ability of the multivariable model that contains the marker to the predictive ability of the model that lacks the marker (2) .
Association with Established Marker(s)
One often begins the analysis of a new marker by first presenting its association or correlation with an established marker (e.g., tumor grade). For example, higher expression levels of the new marker in patients with high-grade tumors might be found. However, the value of an analysis like this is not clear. The results of this correlation analysis are not conclusive regarding the value of the new marker. For example, one would not want to see that a new marker correlated perfectly with an existing marker, as this would imply that the new marker was redundant. That is, equivalent predictions could be obtained by using an established marker. Unless the new marker is cheaper to measure than the established marker, or the new marker allows the patient to avoid a painful procedure (e.g., biopsy), correlation analysis provides little insight into the potential value of the new marker.
Univariable Analysis
The next analysis often provided is a plot of Kaplan-Meier curves for the new marker. An example of this can be seen in Fig. 1
, where curves illustrate survival for high and low expression levels of a new marker. These curves are indeed informative regarding insight into the time to failure for groups of patients. However, the typical concern of the founder of the marker is simply whether the curves are distinct. Again, this analysis contributes very little and does not answer our central question of whether the new marker is of value. The major weakness of this analysis is that established markers are not considered. Just because the new marker shows distinct survival differences does not mean that equivalent separation cannot already be achieved by using an established marker or combination of established markers. This limitation often prompts the plot of Fig. 2
. Here, the new marker separation is compared with that of an established marker (e.g., stage) to show that the new marker provides a wider, more significant separation of prognostic groups.
|
|
Multivariable Analysis
In general, the definitive marker assessment is multivariable analysis. For example, consider Table 1
. Unfortunately, results such as those in Table 1
are also plagued with limitations and do not directly test the value of the new marker. For example, the P essentially tests whether the hazard ratio is 1, not whether the prediction is improved by the new marker (3, 4, 5)
. This is a problem because several issues affect the numerical value of the hazard ratio: (1) how the new marker was coded, for example, categorized or continuous; (2) how the existing markers were coded; (3) which existing markers were included in the analysis; (4) was stepwise variable selection performed or were only the variables significant in univariable analysis included; and (5) how the variables were modeled (e.g., transforms, splines). In short, there are many potential judgment calls that can affect the hazard ratio and render it somewhat subjective. As continuous variables generally have smaller hazard ratios, the hazard ratio also makes them look bad. This leads to categorization of the continuous variables somehow, a process often fraught with difficulty (6)
. Yet another concern with the analysis in Table 1
is that it assumes that a Cox regression (which is necessary to provide the hazard ratios) is the best prediction model. This might not be the case. An alternative (e.g., a classification and regression tree) might provide the most accurate predictions presently available with standard markers and/or the standard plus novel markers. If this is so, this alternative should be incorporated in the marker evaluation. Thus, the standard multivariable analysis does not address the central question of whether our new marker permits us to predict patient outcome more accurately than we are presently able.
|
How then, to show that patient outcome can be predicted more accurately when one has knowledge of the new marker? An attractive solution is to show the improvement in predictive accuracy that is obtained when the new marker is added to a model containing the established markers. To do this, however, one must first choose a metric for predictive accuracy. For example, predictive accuracy can be measured by the concordance index. The concordance index is the probability that given two randomly selected patients, the patient with the worse outcome is, in fact, predicted to have a worse outcome (7) . This measure, similar to the area under the receiver operating characteristic curve, ranges from 0.5 (i.e., chance or a coin flip) to 1 (perfect ability to rank patients). As a measure of a models predictive ability, the concordance index admittedly might not be the perfect metric, and methods of comparing concordance indices do need further development, but it is perhaps the best measure presently available (8) . It is particularly attractive because it does not require that we specify a cutpoint in the predicted value, as would simple classification accuracy as a metric.
With a measure of predictive accuracy in place, one now needs to show how it is affected by the inclusion of the new marker. Consider Table 2
. In this table, three models are being compared with the full model containing all variables. Each of the three models lacks one variable. The model lacking established marker 1 is compared with the full model and the degree to which predictive accuracy is reduced (drop in concordance index) is shown. Here, not knowing established marker 1 would reduce our predictive accuracy, as measured by the concordance index, by 0.1. The critical row of Table 2
is the third. This shows that the predictive accuracy is improved by 0.15 when the new marker is measured. Thus, the incremental value of the new marker has been established, and a framework is provided that does not presume a particular form of the prediction model (e.g., a Cox regression). By focusing on the predictive accuracy measure, the framework allows for the use of any form of prediction model, if shown to provide superior predictive ability.
|
One additional question regarding Table 2
is which established markers should be included. The simple recommendation is to include all other markers that are available and believed to be prognostic, ideally based on biologic rationale, at the time the new marker is measured (9)
. There may not be universal agreement on which variables are believed to be prognostic or the biologic rationale for some of the markers. Therefore, it is important to at least include what is generally felt to be the least common denominator with regard to the list of variables and their measurement scales. Going beyond this only makes a stronger statement regarding the new marker. Previous analyses with other, completely separate data sets, as well as clinical judgment, should determine the list of established markers. In particular, univariable analyses and stepwise variable selection on the same data set (or subset of these data) clearly should not be performed for determining which variables to include in this table. The reason for this is that these methods are biased (10)
, and this bias will cause the new marker to look better than it really is (4
, 8)
. The markers included should comprise all of the markers that would routinely be used to predict patient outcome. Remember the question: does the new marker contribute to our ability to predict patient outcome beyond what we can already achieve based on everything we know about the patient? We need to maximize the value and potential contribution of everything we know before we assess the new marker. Conceivably, for a more direct test of improvement in predictive accuracy, one could use the predicted probability from the established model as the only "established marker" to compare against.
Another attraction of the proposed methodology is that some of the subjective modeling aspects that might affect the hazard ratio estimate can now be made more objective by the focus on predictive accuracy as the criterion. Again, modeling choices should be made with the goal of maximizing predictive accuracy, in particular, the comparison of the most accurate model lacking the marker of interest with the most accurate model containing the marker. Whatever cutpoints, transforms, etc. that produce the most accurately predicting model should be used to make these typically subjective choices more objective.
Other Implications
This article has dealt primarily with the question of whether a new marker is truly a prognostic or predictive factor. However, this question leads to several related questions.
Is the New Marker Better than Established Marker X?
The answer to this question really doesnt matter. Again, it is suggested to ask whether the new marker contributes beyond what is already known, and one typically knows more than established marker X. The goal is not to replace established marker X but instead trying to improve on the performance achieved by using all established markers.
What Is the Most Important Prognostic Factor? Is It the New Marker?
Similarly, the answer to this question really doesnt matter. Whether the new marker is the most important factor is not the issue. The real question is whether the new marker contributes to our ability to predict patient outcome. If it does, we should consider routinely measuring and using it, regardless of its rank in importance. Having said this, the drop in concordance index would be a good measure of importance, as it is a bottom line assessment of predictive accuracy. Granted, the concordance index may be affected by anything that affects the models, such as sample size, selection of variables, measurement error, and modeling methods. Nonetheless, any approach that results in improvement in the concordance index is valuable, this translates into better ability to predict individual patient outcomes.
What Are the Prognostic Factors for this Disease?
The answer to this question does not directly help the individual patient. The prognostic factors, themselves, do not allow a prediction of patient outcome. Rather, they must be combined or organized in some fashion to form a model, and the model predicts patient outcome. Thus, a more proper question would be: "What is the most accurate prediction model for this disease?" After this, our interest is whether this model contains the new marker or, if not, whether the model would be improved by its addition. If neither, the new marker is not a prognostic factor and, thus, not important.
Conclusion
The methods commonly used for marker evaluation are problematic. Instead, an analysis of a markers impact on the concordance index of the prediction model is recommended.
FOOTNOTES
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Requests for reprints: Michael W. Kattan, Departments of Urology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, Mailbox 27, New York, NY 10021. Phone: (646) 422-4386; Fax: (630) 604-3605; E-mail: kattanm{at}mskcc.org
Received 8/19/03; revised 10/ 8/03; accepted 10/14/03.
REFERENCES
This article has been cited by other articles:
![]() |
T. Bhattacharyya, S. J. Nicholls, E. J. Topol, R. Zhang, X. Yang, D. Schmitt, X. Fu, M. Shao, D. M. Brennan, S. G. Ellis, et al. Relationship of Paraoxonase 1 (PON1) Gene Polymorphisms and Functional Activity With Systemic Oxidative Stress and Cardiovascular Risk JAMA, March 19, 2008; 299(11): 1265 - 1276. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. W. Veltri, M. C. Miller, S. Isharwal, C. Marlow, D. V. Makarov, and A. W. Partin Prediction of Prostate-Specific Antigen Recurrence in Men with Long-term Follow-up Postprostatectomy Using Quantitative Nuclear Morphometry Cancer Epidemiol. Biomarkers Prev., January 1, 2008; 17(1): 102 - 110. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Detaint, D. Messika-Zeitoun, J. Maalouf, C. Tribouilloy, D. W. Mahoney, A. J. Tajik, and M. Enriquez-Sarano Quantitative Echocardiographic Determinants of Clinical Outcome in Asymptomatic Patients With Aortic Regurgitation: A Prospective Study J. Am. Coll. Cardiol. Img., January 1, 2008; 1(1): 1 - 11. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C. Ferrari, N. N. Stone, R. Kurek, E. Mulligan, R. McGregor, R. Stock, P. Unger, U. Tunn, A. Kaisary, M. Droller, et al. Molecular Load of Pathologically Occult Metastases in Pelvic Lymph Nodes Is an Independent Prognostic Marker of Biochemical Failure After Localized Prostate Cancer Treatment J. Clin. Oncol., July 1, 2006; 24(19): 3081 - 3088. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. S. Vasan Biomarkers of Cardiovascular Disease: Molecular Basis and Practical Considerations Circulation, May 16, 2006; 113(19): 2335 - 2362. [Full Text] [PDF] |
||||
![]() |
J. D. Brenton, L. A. Carey, A. A. Ahmed, and C. Caldas Molecular Classification and Molecular Forecasting of Breast Cancer: Ready for Clinical Application? J. Clin. Oncol., October 10, 2005; 23(29): 7350 - 7360. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Simon Development and Validation of Therapeutically Relevant Multi-Gene Biomarker Classifiers J Natl Cancer Inst, June 15, 2005; 97(12): 866 - 867. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Cancer Research | Clinical Cancer Research |
| Cancer Epidemiology Biomarkers & Prevention | Molecular Cancer Therapeutics |
| Molecular Cancer Research | Cancer Prevention Research |
| Cancer Prevention Journals Portal | Cancer Reviews Online |
| Annual Meeting Education Book | Meeting Abstracts Online |