Calculates the Integrated Graf Score, aka integrated Brier score or squared loss.

For an individual who dies at time \(t\), with predicted Survival function, \(S\), the Graf Score at time \(t^*\) is given by $$L(S,t|t^*) = [(S(t^*)^2)I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*))^2)I(t > t^*)(1/G(t^*))]$$ # nolint where \(G\) is the Kaplan-Meier estimate of the censoring distribution.

The re-weighted IGS, IGS* is given by $$L(S,t|t^*) = [(S(t^*)^2)I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*))^2)I(t > t^*)(1/G(t))]$$ # nolint where \(G\) is the Kaplan-Meier estimate of the censoring distribution, i.e. always weighted by \(G(t)\). IGS* is strictly proper when the censoring distribution is independent of the survival distribution and when G is fit on a sufficiently large dataset. IGS is never proper. Use proper = FALSE for IGS and proper = TRUE for IGS*, in the future the default will be changed to proper = TRUE. Results may be very different if many observations are censored at the last observed time due to division by 1/eps in proper = TRUE.

Note: If comparing the integrated graf score to other packages, e.g. pec, then method = 2 should be used. However the results may still be very slightly different as this package uses survfit to estimate the censoring distribution, in line with the Graf 1999 paper; whereas some other packages use prodlim with reverse = TRUE (meaning Kaplan-Meier is not used).

If integrated == FALSE then the sample mean is taken for the single specified times, \(t^*\), and the returned score is given by $$L(S,t|t^*) = \frac{1}{N} \sum_{i=1}^N L(S_i,t_i|t^*)$$ where \(N\) is the number of observations, \(S_i\) is the predicted survival function for individual \(i\) and \(t_i\) is their true survival time.

If integrated == TRUE then an approximation to integration is made by either taking the sample mean over all \(T\) unique time-points (method == 1), or by taking a mean weighted by the difference between time-points (method == 2). Then the sample mean is taken over all \(N\) observations. $$L(S) = \frac{1}{NT} \sum_{i=1}^N \sum_{j=1}^T L(S_i,t_i|t^*_j)$$


If task and train_set are passed to $score then G is fit on training data, otherwise testing data. The first is likely to reduce any bias caused by calculating parts of the measure on the test data it is evaluating. The training data is automatically used in scoring resamplings.


This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():


Meta Information

  • Type: "surv"

  • Range: \([0, \infty)\)

  • Minimize: TRUE

  • Required prediction: distr


Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999). “Assessment and comparison of prognostic classification schemes for survival data.” Statistics in Medicine, 18(17-18), 2529--2545. doi: 10.1002/(sici)1097-0258(19990915/30)18:17/18<2529::aid-sim274>;2-5 .

See also

Super classes

mlr3::Measure -> mlr3proba::MeasureSurv -> mlr3proba::MeasureSurvIntegrated -> MeasureSurvGraf

Active bindings


If TRUE returns the standard error of the measure.


Public methods

Inherited methods

Method new()

Creates a new instance of this R6 class.


  integrated = TRUE,
  method = 2,
  se = FALSE,
  proper = FALSE,
  eps = 0.001



If TRUE (default), returns the integrated score; otherwise, not integrated.


If integrate == TRUE then a vector of time-points over which to integrate the score. If integrate == FALSE then a single time point at which to return the score.


If integrate == TRUE selects the integration weighting method. method == 1 corresponds to weighting each time-point equally and taking the mean score over discrete time-points. method == 2 corresponds to calculating a mean weighted by the difference between time-points. method == 2 is default to be in line with other packages.


If TRUE returns the standard error of the measure.


If TRUE then weights scores by the censoring distribution at the observed event time, which results in a strictly proper scoring rule if censoring and survival time distributions are independent and a sufficiently large dataset is used to weight the measure. If FALSE then weights scores by the Graf method which is the more common usage but the loss is not proper. In v0.5.0, the default will be changed to TRUE.


Very small number to set zero-valued predicted probabilities to in order to prevent errors in log(0) and 1/0 calculation.

Method clone()

The objects of this class are cloneable with this method.


MeasureSurvGraf$clone(deep = FALSE)



Whether to make a deep clone.