Abstract
Competing risks occur commonly in medical research. For example, both treatmentrelated mortality and disease recurrence are important outcomes of interest and wellknown competing risks in cancer research. In the analysis of competing risks data, methods of standard survival analysis such as the KaplanMeier method for estimation of cumulative incidence, the logrank test for comparison of cumulative incidence curves, and the standard Cox model for the assessment of covariates lead to incorrect and biased results. In this article, we discuss competing risks data analysis which includes methods to calculate the cumulative incidence of an event of interest in the presence of competing risks, to compare cumulative incidence curves in the presence of competing risks, and to perform competing risks regression analysis. A hypothetical numeric example and real data are used to compare those three methods in the competing risks data analysis to their respective counterparts in the standard survival analysis. The source and magnitude of bias from the KaplanMeier estimate is also detailed.
 cumulative incidence
 competing risk data
Cumulative incidence of an event is often of interest in medical research and is frequently presented in medical articles. The graphical display of the cumulative incidence function (i.e., failure probabilities) over time is intuitive and appealing. However, methods for estimating cumulative incidence function must be clearly understood. The KaplanMeier (KM) method has been a widely used tool for estimating survival function and cumulative incidence function, a complementary of survival function (1). This method is conceptually easy to understand and easy to calculate. However, if there is more than one type of event (or failure), and if these events are dependent, KM estimates are biased. This bias arises because the KM method assumes that all events are independent, and thus, censors events other than the event of interest. For example, disease relapse is an event of interest in studies of allogeneic hematopoietic stem cell transplantation (HSCT), as is mortality related to complications of transplantation (transplantrelated mortality or TRM). Relapse and TRM are not independent in this setting because these two events are likely related to immunologic effector mechanisms following HSCT, whereby efforts to reduce TRM may adversely affect the risk of relapse; moreover, patients who die from TRM cannot be at further risk of relapse. Therefore, the KM method is inappropriate for estimating the cumulative incidence rate of relapse in the presence of TRM because it censors TRM.
In addition to estimating the cumulative incidence of an event, it is often of interest to determine whether there is a difference in the cumulative incidence rates among different groups. In standard survival analysis, this is done using the logrank test to compare curves generated with the KM method (2). In the presence of competing risks, however, this is inappropriate, for the same reason given above. Instead, Gray investigated this issue and proposed a class of tests for comparing the cumulative incidence curves of a particular type of failure among different groups in the presence of competing risks (3).
Finally, when there is a difference in the cumulative incidence curves among different treatment groups, it is also important to determine whether this difference is solely due to the treatment or to the confounding factors, such as age or baseline disease stage. In standard survival analyses, this question is usually addressed by fitting a Cox proportional hazards model (4, 5). In fact, one may attempt to construct a causespecific standard Cox model for a particular failure treating other competing risks censored. However, the effect of a covariate on an event from either a causespecific (e.g., relapse) model or cause nonspecific (e.g., relapse and TRM combined) model may be very different from the effect of the covariate of the event (e.g., relapse) in the presence of competing risks (e.g., TRM). Fine and Gray (6) and Klein and Andersen (7) proposed a method for direct regression modeling of the effect of covariates on the cumulative incidence function for competing risks data. As in any other regression analysis, modeling cumulative incidence functions for competing risks can be used to identify potential prognostic factors for a particular failure in the presence of competing risks, or to assess a prognostic factor of interest after adjusting for other potential risk factors in the model.
In this article, we will present the three aforementioned methods in the analysis of competing risks data in more detail. We will compare each method to its counterpart in standard survival analysis, and show why the latter are inadequate in the presence of competing risks. Using hypothetical numeric example and real data, we will demonstrate the use of these three methods, compare the results to the results obtained from standard survival analysis, and discuss the source and magnitude of bias that arises from standard methods.
Estimating Cumulative Incidence in the Presence of Competing Risks
KM method. Survival probability at a certain time is a conditional probability of surviving beyond that time, given that an individual has survived just prior to that time. This conditional probability can be estimated in a study as the number of patients who are alive or eventfree without loss to followup at that time, divided by the number of patients who were alive just prior to that time. The KM estimate of survival probability is then the product of these conditional probabilities up until that time. Similarly, the probability of failure at a certain time is a conditional probability of having an event at that time, given that an individual has not had an event just prior to that time. This conditional probability can be estimated in a study as the probability of surviving just prior to that time multiplied by the number of patients with the event at that time, divided by the number of patients at risk. Then cumulative incidence of a failure is the sum of these conditional probabilities over time. Of note, “event” and “failure” are used interchangeably in the literature, and the event of interest could be death from any cause, relapse, treatmentrelated mortality, and stroke in cardiovascular disease. Individuals without an event are “censored”—meaning that their observed failure time is incomplete because these individuals have not failed yet, whereas the failure time for individuals with an event is fully observed.
Suppose that 10 patients enter a clinical trial at different time points (i.e., staggered entry), are followed until a specified time, and have independent and distinct failure times. Suppose that the event of interest is relapse. Patients who fail from other causes or who are still alive at the specified time point are censored. Patients who are lost to follow up during the study period are considered alive and relapsefree at the last time seen alive and thus censored. In order to calculate the KM estimate of cumulative relapsefree survival probability, the observed failure times for all patients need to be ordered from the smallest to the largest, irrespective of the censoring status, and each failure time is paired with the information of censoring status. For example, let t_{0} denote the time zero or the time of study entry, t_{1} denote the smallest observed failure time, and t_{10} denote the largest observed failure time to the event. Then the KM estimate of relapsefree survival probability, S_{KM}^{rel}(t), is
The KM estimate of incidence of relapse at a specified time point is then the probability of relapsefree survival just prior to that time, multiplied by the number of relapses at that time, divided by the number of patients at risk (that is, alive, relapsefree, and not lost to followup) just prior to that time. Cumulative incidence is then a sum of these conditional probabilities over time. More specifically, the cumulative incidence using the KM method, denoted as CI_{KM}^{rel}, is calculated as follows:
The KM estimate of cumulative incidence function is simple and useful for a single end point such as relapse. However, if deaths due to other causes exclusive of relapse (i.e., TRM) are also of interest, the KM method does not capture the dependency of competing risks. For example, in allogeneic HSCT for patients with hematologic malignancies, both relapse and TRM are of equal importance to patients as well as physicians. As previously mentioned, efforts to modify the relapse rate through immune effector mechanisms may adversely affect TRM rates (vice versa is also true), and therefore, relapse and TRM are not independent events. If the KM method is used to estimate the cumulative incidence of relapse in the presence of TRM, patients dying of TRM are censored. However, unlike truly censored observations, patients who die of TRM cannot then relapse, and hence, their risk for relapse is 0. Therefore, the survival probability is overestimated and thus cumulative incidence of relapse is also overestimated (as shown in the example below) in the KM method.
Competing Risks Method
The cumulative incidence of relapse in the presence of TRM as a competing risk (CR) can be calculated similarly as in the KM method. The difference lies in the probability of an eventfree survival just prior to a certain time. In the KM method, this probability is a relapsefree probability [S_{KM}^{rel}(t)], whereas it is a relapse and TRMfree probability in the CR method [S_{KM}^{rel,trm}(t)]. More specifically, the cumulative incidence of relapse in the presence of TRM as a competing risk, denoted as CI_{CR}^{rel}, is calculated aswhere S_{KM}^{rel,TRM}(t) denotes the KM estimate of survival probability for the joint events of relapse and TRM. In the CR method, unlike the KM method, patients dying from TRM are counted as events when calculating the eventfree survival, S_{KM}^{rel,TRM}(t), and only patients who are truly alive are considered at risk for relapse. Because the relapsefree survival [S_{KM}^{rel}(t)] in the KM method is always greater than or equal to the relapse and TRMfree survival [S_{KM}^{rel,TRM}(t)] in the CR method, the cumulative incidence using the KM method is always greater than or equal to the cumulative incidence using the CR method. The cumulative incidence of TRM in the presence of relapse as a competing risk can be calculated similarly. The magnitude of overestimation in the KM method depends on the incidence rate levels of competing events.
Example 1: a hypothetical numeric example. We shall show the computation of cumulative incidence function of relapse below using the KM and CR methods. Suppose that there are 10 patients with the ordered failed or censored times shown belowwhere + denotes a censored time, (R) denotes relapse, and (T) denotes death due to treatmentrelated complications (TRM). The detailed calculation of KM estimate for cumulative incidence of relapse (CI_{KM}^{rel} or KM CIR, for the purpose of simplicity) is presented in Table 1 . At the time of study entry, all patients were alive, and thus, the KM estimate of relapsefree survival (S_{KM}^{rel} or RFS, for the purpose of simplicity) at t = 0 was 1 and the KM CIR at t = 0 was 0. If one relapse occurs at t = 10, then the RFS at t = 10 is 0.9 and the KM CIR at t = 10 is 0.1 (10 patients were alive just prior to t = 10, but because one death occurred at t = 10, the RFS dropped to 0.9 at t = 10). Because the patient with the second smallest observed failure time was alive (thus censored), the RFS at t = 20 was 0.9 and the KM CIR at t = 20 was 0.1. The patient with the third smallest observed time died from relapse at t = 35, and thus the RFS at t = 35 was 0.79, i.e., the second patient who was alive and thus censored at t = 20 was excluded from the denominator of the number of patients at risk just prior to t = 35 because this patient's survival information beyond t = 20 is unknown. The KM CIR at t = 35 was thus 0.21. The RFS at t = 20 (0.9) was multiplied by the incidence rate of relapse at t = 35 because only those patients who were alive just prior to t = 35 were subject to relapse at t = 35. If the patient with the fourth smallest observed time, t = 40, dies from TRM, then the KM method treats this as no event, and thus censors the observation. Therefore, the RFS at t = 40 was 0.79, and the corresponding KM CIR was 0.21. Notably, the RFS and KM CIR at t = 40 was the same as the ones at t = 35, even though the calculation is slightly different. In fact, the RFS and CIR remain the same until there is another incidence of relapse at t = 55. Because of this, RFS and KM CIR are often calculated at failure times only. At t = 55, the RFS drops to 0.63, and the KM CIR increases to 0.37. At all time points, the sum of RFS and KM CIR was equal to 1 [i.e., S_{KM}^{rel}(t) + CI_{KM}^{rel}(t) = 1].
This can be contrasted with the detailed calculation of cumulative incidence function using the CR method presented in Table 1. In the CR method, death due to TRM is counted as death in the calculation of eventfree survival (EFS) at each time point. At t = 40, the relapse and TRMfree survival (S_{KM}^{rel,TRM} or EFS) in the CR method is 0.68, whereas the RFS in the KM method is 0.79 in Table 1. However, this difference does not immediately affect the CIR because there is no incidence of relapse at t = 40. At t = 55, the cumulative incidence of relapse using the CR method (CI_{CR}^{rel} or CR CIR) is 0.35, which is lower than the KM CIR of 0.37. This is because the KM method treats one death due to TRM at t = 40 as censored, which results in overestimation of the RFS at t = 50 (0.79). At t = 80, the difference between KM CIR (0.69) and CR CIR (0.48) is more apparent (0.21). This is because the RFS (0.63) in the KM method is substantially overestimated, compared with the EFS (0.27) in the CR method at t = 71, and consequently, the KM CIR is also overestimated. Following similar steps, the cumulative incidence of TRM in the presence of relapse as a competing risk (CI_{CR}^{TRM} or CIT) is presented in Table 1. In the CR method, at all time points, the sum of EFS, CR CIR, and CR CIT is equal to 1 [S_{KM}^{rel,TRM} (t) + CI_{CR}^{rel}(t) + CI_{CR}^{rel}(t) = 1]. Figure 1 shows the cumulative incidence curves using the KM and CR methods in the presence of TRM as a competing risk. Two curves jump at the same time whenever relapse occurs, but the magnitude of each jump is smaller in the CR method.
Example 2a: myeloablative versus nonmyeloablative allogeneic HSCT for patients >50 years of age—a real data example. To illustrate the KM and CR methods with real data, we considered a typical competing risks data set of relapse and TRM presented in Alyea et al. (8) as an example. One hundred and fiftytwo patients over the age of 50 who underwent T cell–replete HLAmatched allogeneic transplantation from 1997 to 2002 at our institution were included. Of these 152 patients, 81 underwent myeloablative and 71 underwent nonmyeloablative transplantation. In the study, as in many HSCT studies, both relapse and TRM were equally important. Figure 2A compares the CIR curves between the KM and CR method for 81 patients who underwent myeloablative transplantation. The 4year CIR was 30% using the CR method but 50% using the KM method. This rather large difference in the CIR suggests that there is a high incidence of a competing risk of TRM. In fact, approximately two thirds of the deaths in that study were due to TRM and not relapserelated. Figure 2B presents the cumulative incidence of TRM (CIT) using both methods. The 4year CIT was 50% using the CR method and 59% using the KM method. This difference is smaller compared with the difference in CIR. This is because the competing risk of relapse is lower than the incidence of TRM after myeloablative allogeneic transplantation. As a consequence, if the KM method is used, one could falsely claim that the CIR is high (almost as high as CIT) after myeloablative allogeneic transplantation. As in our hypothetical example above (Fig. 1), the jumps in Fig. 2A and B occur at the same time in both methods because, irrespective of the method used, the occurrence time of an event does not change.
Comparison of Cumulative Incidence Curves in the Presence of Competing Risks
In addition to estimating the cumulative incidence of an event, comparing cumulative incidence curves among different treatment groups is useful when selecting the appropriate treatment for a particular patient. When there are no competing risks, a MantelHaenzel logrank test (2) is used to compare KM cumulative incidence curves. This test is obtained by constructing a 2 × 2 table at each distinct failure time, comparing the failure rates between two groups, and then combining tables over time. In CR data, this test is inappropriate. Gray (3) proposed a class of tests for comparing the cumulative incidence curves of a particular type of failure among different groups in the presence of competing risks. A detailed description of the method is beyond the scope of this review. Simply stated, the Gray test compares weighted averages of the hazards of cumulative incidence function using the cumulative incidence estimation equation [CI_{CR}^{rel}(t)] described in the previous section and the test statistic has a χ^{2} distribution.
Example 2b. Using the same data set presented in example 2a, Fig. 3 [Fig. 1 in Alyea et al. (8)] shows cumulative incidences of relapse and TRM using the CR method after myeloablative and nonmyeloablative transplantation. The 3year cumulative incidence of TRM (CIT) was 50% after myeloablative transplantation, but was 32% after nonmyeloablative transplantation (P = 0.01). The CIT was calculated in the presence of relapse as a competing risk and this difference was tested using the Gray method. Similarly, the 3year CIR was 30% after myeloablative transplantation and 46% after nonmyeloablative transplantation (P = 0.052). In comparison with the CR method, the 3year CIR using the KM method was 50% after myeloablative and 61% after nonmyeloablative transplantation (P = 0.35). These results suggest that even though the cumulative incidence rates of combined events (relapse and TRM) are similar between two types of transplantation (80% versus 78%), the immunologic effects are different.
CR Regression Analysis
In standard survival analyses, when testing a group (or treatment) difference, the usual steps are to present KM survival or cumulative incidence curves, test the difference of these survival curves using a logrank test, and perform a proportional hazards regression analysis. This last step can be used to identify important potential prognostic factors, and/or to assess the prognostic significance of a given factor after adjusting for other risk factors. In a competing risks situation, the equivalent steps are to generate cumulative incidence curves using the CR method described above, test the difference between cumulative incidence curves using the Gray method, and perform a CR regression analysis. A standard Cox proportional hazards model analysis is not adequate in the presence of competing risks because the causespecific Cox model treats competing risks of the event of interest as censored observations, and the causespecific hazard function does not have a direct interpretation in terms of survival probability.
Because the simple relationship between a single end point and a single crude hazard does not hold in the presence of competing risks, Fine and Gray (6) and Klein and Andersen (7) proposed a direct regression modeling of the effect of covariates on the cumulative incidence function for CR data. These models distinguish between patients who are still alive and those who have already failed from competing causes and allow direct inference regarding the effects of covariates on the cumulative incidence function. The Fine and Gray method is based on proportional hazards model, whereas the Klein and Andersen method is based on the pseudovalues from a jackknife statistic from the cumulative incidence curve. When the two methods were compared in a real data example, results from both approaches were in close agreement (7).
Example 2c. Returning to our example of myeloablative versus nonmyeloablative allogeneic HSCT for patients >50 years of age, the cumulative incidence curves in Fig. 3 indicate that myeloablative transplantation is associated with an increased risk for TRM (P = 0.01) and nonmyeloablative transplantation is associated with an increased risk for relapse (P = 0.052). To investigate these differences further, we fitted regression models using the standard Cox and the Fine and Gray approach for relapse and TRM. In the Cox model, relapse and TRM are considered jointly in the outcome; in the Fine and Gray model, they are considered individually. Both models used the type of transplantation (myeloablative versus nonmyeloablative) as a covariate, along with patient age, unrelated versus related donor, bone marrow versus peripheral blood progenitor cells, unfavorable versus favorable prognosis, second versus first transplant, FK506 versus cyclosporinebased acute graftversushost disease (GVHD) prophylaxis, and donorrecipient sex mismatch. The results are shown in Table 2 . In both models, the type of transplantation was not a significant factor for outcome after adjusting for pretransplant characteristics. This can be explained by confounding. Of the 40 patients with myeloablative transplantation who died of TRM, 35 received bone marrow stem cells. Of five patients with nonmyeloablative transplantation, four received bone marrow stem cells and died of TRM. Therefore, the difference in the cumulative incidence of TRM (Fig. 3) was confounded in part by the source of progenitor stem cells (i.e., bone marrow versus peripheral blood). This is apparent in the CR regression model for TRM in Table 2. In that model, the hazard ratio (HR) for TRM of bone marrow use was 2.24 (P = 0.057). In contrast, in the Cox model for relapse and TRM combined, the HR for bone marrow as compared with peripheral blood stem cell use was 1.13 (P = 0.71).
Similarly, in Fig. 3, it seems that nonmyeloablative transplantation is associated with an increased risk for relapse. However, this is also confounded by unfavorable risk status at the time of transplantation. There were 51 patients with relapse: 23 after myeloablative and 28 after nonmyeloablative transplantation. Of these 51 patients, 41 (24 nonmyeloablative, 17 myeloablative patients) had unfavorable risk characteristics at the time of transplantation. When unfavorable prognosis is controlled for in the multivariate model, nonmyeloablative transplantation is no longer associated with an increased risk for relapse (HR = 0.57, P = 0.33). Note that results from the CR regression analysis were not presented in the published article (8).
CR regression analysis is also useful to identify other prognostic factors, other than type of transplantation, for each type of failure. In Table 2, donorrecipient sex mismatched were associated with a decreased risk for relapse (β = −0.66, HR = 0.52, P = 0.04). Although this is indicated in the Cox model of relapse and TRM combined events in Table 2 (β = −0.39, HR = 0.68, P = 0.053), it is unknown from this model whether this is due to relapse or TRM prevention. In fact, the hazard of relapse and TRM for sexmismatched patients compared with sexmatched patients point to the opposite direction: β = −0.069 with HR = 0.52 for relapse versus β = 0.179 with HR = 1.20 for TRM, even though the latter is not statistically significant. Again, the Cox model is not appropriate for identifying risk factors for cumulative incidence of a specific event in the presence of competing risks.
Discussion
Competing risks occur frequently in cancer research even though their presence may not always be recognized at the time of analysis. In many cancer treatments, efforts have been made to decrease disease recurrence but often at the cost of increased toxicity due to the treatment. Examples of competing risks in HSCT studies include: incidence of acute GVHD and 100day death without development of acute GVHD, death due to disease and due to transplantrelated complications, and relapse and chronic GVHD for longterm survivors. Similarly, in umbilical cord blood transplantation, a reduced incidence of acute GVHD with an increased incidence of infectionrelated morbidity and mortality have been reported (9). In breast cancer studies, competing risks include death due to local/regional recurrence of disease versus distant metastases and distant metastases versus non–cancer death in the study of radiation therapy (3).
When competing risks are present, there are three ways to analyze the data: (a) analysis of the event of interest ignoring CR, (b) analysis of joint events as a single end point, and (c) analysis of CR. The first approach is incorrect and will lead to erroneous results as shown in the previous sections. The magnitude of the error could be substantial if the incidence of CR is high, or could be minimal if the incidence of CR is low. However, one could not know the effect of competing risks a priori unless CR analysis is done. The second approach is correct, but is too limited to address various important research questions. The combination of the second and third approaches is a comprehensive approach that addresses general as well as specific study questions. In the example of the myeloablative versus nonmyeloablative HSCT study, when the events of relapse and TRM are combined and analyzed as a single event as progression, there is no difference between the two types of transplantation [see Fig. 4 in Alyea et al. (8)]. This is because this type of analysis is limited to answer to only one question—whether the progressionfree survival of nonmyeloablative HSCT is better than myeloablative HSCT. Although this approach is informative, this analysis is not sufficient to answer whether the CIR or CIT between myeloablative and nonmyeloablative transplantations are different. The CR analysis of relapse and TRM is an appropriate method to answer these more advanced questions.
In the analysis of CR data, it is important to present both the results of the event of interest and the results of competing risks. In the example of the myeloablative versus nonmyeloablative HSCT study, presentation of CIR without CIT would be misleading because even though the CIR is higher in the nonmyeloablative transplantation compared with the myeloablative transplantation, the CIT is lower. Similarly, in a study of umbilical cord blood transplantation, comparison of cumulative incidence of acute GVHD between umbilical cord blood and other types of transplantation would be erroneous if the early infection rates are not considered simultaneously.
Fitting a CR regression model is also important. Just as in the standard survival analysis, analysis of competing risks is incomplete without CR regression analysis. CR regression analysis is used to identify risk factors for each competing risk. In example 2c, donorrecipient sex mismatch was associated with a decreased risk of relapse (HR = 0.52, P = 0.04), but not for TRM. Also, the effects of bone marrow stem cells on relapse (β = −0.78, HR = 0.46) and on TRM (β = 0.81, HR = 2.24) were contrasting, as shown in the coefficient estimate, β, and HR (even though these effects are not statistically significant) and this opposite effect of a covariate would not be detected by fitting a standard Cox model. This is because the standard Cox model is not designed to answer what risk factors contribute to relapse in the presence of the competing risk of TRM. Using a causespecific Cox regression model is incorrect because it ignores competing risks and treats them as censored. Fitting a CR regression model is also important to confirm whether the difference seen in the cumulative incidence curves is true or confounded by other risk factors. This is illustrated in example 2c. The difference in the cumulative incidence of TRM was confounded by the bone marrow stem cell use. Similarly the difference in the cumulative incidence of relapse was confounded by unfavorable prognosis at stem cell infusion.
In summary, the important first step for the analysis of CR data is the recognition that competing risks are present. Following this, the analysis should include a calculation of cumulative incidence of an event of interest in the presence of competing risks, a proper test for cumulative incidence curves of an event, and CR regression analyses. Software packages for the KM method and Cox proportional hazards regression model are available in Splus, SAS, and SPSS. R and S programs for the Gray test (3) and the Fine and Gray CR regression model (6) can be obtained from the web page of Robert Gray^{1} or by contacting him at gray.robert{at}jimmy.harvard.edu. Future research should include sample size and/or power calculation in CR data.
Acknowledgments
We gratefully acknowledge helpful feedback from Drs. Philippe Armand, Corey Cutler, Stephanie Lee, Robert Gray, David Harrington, and Jerome Ritz.
Footnotes

Grant support: National Heart, Lung, and Blood Institute grant HL070149 and National Institute of Allergy and Infectious Diseases grant AI029530.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
 Accepted September 26, 2006.
 Received May 18, 2006.
 Revision received August 8, 2006.