Skip to contents

Calculates the Integrated Graf Score, aka integrated Brier score or squared loss.

For an individual who dies at time \(t\), with predicted Survival function, \(S\), the Graf Score at time \(t^*\) is given by $$L(S,t|t^*) = [(S(t^*)^2)I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*))^2)I(t > t^*)(1/G(t^*))]$$ # nolint where \(G\) is the Kaplan-Meier estimate of the censoring distribution.

The re-weighted IGS, IGS* is given by $$L(S,t|t^*) = [(S(t^*)^2)I(t \le t^*, \delta = 1)(1/G(t))] + [((1 - S(t^*))^2)I(t > t^*)(1/G(t))]$$ # nolint where \(G\) is the Kaplan-Meier estimate of the censoring distribution, i.e. always weighted by \(G(t)\). IGS* is strictly proper when the censoring distribution is independent of the survival distribution and when G is fit on a sufficiently large dataset. IGS is never proper. Use proper = FALSE for IGS and proper = TRUE for IGS*, in the future the default will be changed to proper = TRUE. Results may be very different if many observations are censored at the last observed time due to division by 1/eps in proper = TRUE.

Note: If comparing the integrated graf score to other packages, e.g. pec, then method = 2 should be used. However the results may still be very slightly different as this package uses survfit to estimate the censoring distribution, in line with the Graf 1999 paper; whereas some other packages use prodlim with reverse = TRUE (meaning Kaplan-Meier is not used).

If integrated == FALSE then the sample mean is taken for the single specified times, \(t^*\), and the returned score is given by $$L(S,t|t^*) = \frac{1}{N} \sum_{i=1}^N L(S_i,t_i|t^*)$$ where \(N\) is the number of observations, \(S_i\) is the predicted survival function for individual \(i\) and \(t_i\) is their true survival time.

If integrated == TRUE then an approximation to integration is made by either taking the sample mean over all \(T\) unique time-points (method == 1), or by taking a mean weighted by the difference between time-points (method == 2). Then the sample mean is taken over all \(N\) observations. $$L(S) = \frac{1}{NT} \sum_{i=1}^N \sum_{j=1}^T L(S_i,t_i|t^*_j)$$


If task and train_set are passed to $score then G is fit on training data, otherwise testing data. The first is likely to reduce any bias caused by calculating parts of the measure on the test data it is evaluating. The training data is automatically used in scoring resamplings.


This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():


Meta Information

  • Type: "surv"

  • Range: \([0, \infty)\)

  • Minimize: TRUE

  • Required prediction: distr


Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999). “Assessment and comparison of prognostic classification schemes for survival data.” Statistics in Medicine, 18(17-18), 2529--2545. doi:10.1002/(sici)1097-0258(19990915/30)18:17/18<2529::aid-sim274>;2-5 .

Super classes

mlr3::Measure -> mlr3proba::MeasureSurv -> MeasureSurvGraf


Inherited methods

Method new()

Creates a new instance of this R6 class.


MeasureSurvGraf$new(ERV = FALSE)



Standardize measure against a Kaplan-Meier baseline (Explained Residual Variation)

Method clone()

The objects of this class are cloneable with this method.


MeasureSurvGraf$clone(deep = FALSE)



Whether to make a deep clone.