Calculates the Integrated Survival Brier Score (ISBS), Integrated Graf Score or squared survival loss.
Details
This measure has two dimensions: (test set) observations and time points. For a specific individual \(i\) from the test set, with observed survival outcome \((t_i, \delta_i)\) (time and censoring indicator) and predicted survival function \(S_i(t)\), the observation-wise loss integrated across the time dimension up to the time cutoff \(\tau^*\), is:
$$L_{ISBS}(S_i, t_i, \delta_i) = \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i^2(\tau) \text{I}(t_i \leq \tau, \delta=1)}{G(t_i)} + \frac{(1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(\tau)} \ d\tau$$
where \(G\) is the Kaplan-Meier estimate of the censoring distribution.
The re-weighted ISBS (RISBS) is:
$$L_{RISBS}(S_i, t_i, \delta_i) = \delta_i \text{I}(t_i \leq \tau^*) \int^{\tau^*}_0 \frac{S_i^2(\tau) \text{I}(t_i \leq \tau) + (1-S_i(\tau))^2 \text{I}(t_i > \tau)}{G(t_i)} \ d\tau$$
which is always weighted by \(G(t_i)\) and is equal to zero for a censored subject.
To get a single score across all \(N\) observations of the test set, we return the average of the time-integrated observation-wise scores: $$\sum_{i=1}^N L(S_i, t_i, \delta_i) / N$$
Dictionary
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
Parameters
Id | Type | Default | Levels | Range |
integrated | logical | TRUE | TRUE, FALSE | - |
times | untyped | - | - | |
t_max | numeric | - | \([0, \infty)\) | |
p_max | numeric | - | \([0, 1]\) | |
method | integer | 2 | \([1, 2]\) | |
se | logical | FALSE | TRUE, FALSE | - |
proper | logical | FALSE | TRUE, FALSE | - |
eps | numeric | 0.001 | \([0, 1]\) | |
ERV | logical | FALSE | TRUE, FALSE | - |
Parameter details
integrated
(logical(1)
)
IfTRUE
(default), returns the integrated score (eg across time points); otherwise, not integrated (eg at a single time point).
times
(numeric()
)
Ifintegrated == TRUE
then a vector of time-points over which to integrate the score. Ifintegrated == FALSE
then a single time point at which to return the score.
t_max
(numeric(1)
)
Cutoff time \(\tau^*\) (i.e. time horizon) to evaluate the measure up to. Mutually exclusive withp_max
ortimes
. This will effectively remove test observations for which the observed time (event or censoring) is strictly more thant_max
. It's recommended to sett_max
to avoid division byeps
, see Details. Ift_max
is not specified, anInf
time horizon is assumed.
p_max
(numeric(1)
)
The proportion of censoring to integrate up to in the given dataset. Mutually exclusive withtimes
ort_max
.
method
(integer(1)
)
Ifintegrate == TRUE
, this selects the integration weighting method.method == 1
corresponds to weighting each time-point equally and taking the mean score over discrete time-points.method == 2
corresponds to calculating a mean weighted by the difference between time-points.method == 2
is the default value, to be in line with other packages.
se
(logical(1)
)
IfTRUE
then returns standard error of the measure otherwise returns the mean across all individual scores, e.g. the mean of the per observation scores. Default isFALSE
(returns the mean).
proper
(logical(1)
)
IfTRUE
then weights scores by the censoring distribution at the observed event time, which results in a strictly proper scoring rule if censoring and survival time distributions are independent and a sufficiently large dataset is used. IfFALSE
then weights scores by the Graf method which is the more common usage but the loss is not proper.
eps
(numeric(1)
)
Very small number to substitute zero values in order to prevent errors in e.g. log(0) and/or division-by-zero calculations. Default value is 0.001.
ERV
(logical(1)
)
IfTRUE
then the Explained Residual Variation method is applied, which means the score is standardized against a Kaplan-Meier baseline. Default isFALSE
.
Properness
RISBS is strictly proper when the censoring distribution is independent
of the survival distribution and when \(G(t)\) is fit on a sufficiently large dataset.
ISBS is never proper. Use proper = FALSE
for ISBS and
proper = TRUE
for RISBS.
Results may be very different if many observations are censored at the last
observed time due to division by \(1/eps\) in proper = TRUE
.
Time points used for evaluation
If the times
argument is not specified (NULL
), then the unique (and
sorted) time points from the test set are used for evaluation of the
time-integrated score.
This was a design decision due to the fact that different predicted survival
distributions \(S(t)\) usually have a discretized time domain which may
differ, i.e. in the case the survival predictions come from different survival
learners.
Essentially, using the same set of time points for the calculation of the score
minimizes the bias that would come from using different time points.
We note that \(S(t)\) is by default constantly interpolated for time points that fall
outside its discretized time domain.
Naturally, if the times
argument is specified, then exactly these time
points are used for evaluation.
A warning is given to the user in case some of the specified times
fall outside
of the time point range of the test set.
The assumption here is that if the test set is large enough, it should have a
time domain/range similar to the one from the train set, and therefore time
points outside that domain might lead to interpolation or extrapolation of \(S(t)\).
Implementation differences
If comparing the integrated graf score to other packages, e.g.
pec, then method = 2
should be used. However the results may
still be very slightly different as this package uses survfit
to estimate
the censoring distribution, in line with the Graf 1999 paper; whereas some
other packages use prodlim
with reverse = TRUE
(meaning Kaplan-Meier is
not used).
Data used for Estimating Censoring Distribution
If task
and train_set
are passed to $score
then \(G(t)\) is fit on training data,
otherwise testing data. The first is likely to reduce any bias caused by calculating
parts of the measure on the test data it is evaluating. The training data is automatically
used in scoring resamplings.
Time Cutoff Details
If t_max
or p_max
is given, then \(G(t)\) will be fitted using all observations from the
train set (or test set) and only then the cutoff time will be applied.
This is to ensure that more data is used for fitting the censoring distribution via the
Kaplan-Meier.
Setting the t_max
can help alleviate inflation of the score when proper
is TRUE
,
in cases where an observation is censored at the last observed time point.
This results in \(G(t_{max}) = 0\) and the use of eps
instead (when t_max
is NULL
).
References
Graf E, Schmoor C, Sauerbrei W, Schumacher M (1999). “Assessment and comparison of prognostic classification schemes for survival data.” Statistics in Medicine, 18(17-18), 2529–2545. doi:10.1002/(sici)1097-0258(19990915/30)18:17/18<2529::aid-sim274>3.0.co;2-5 .
See also
Other survival measures:
mlr_measures_surv.calib_alpha
,
mlr_measures_surv.calib_beta
,
mlr_measures_surv.chambless_auc
,
mlr_measures_surv.cindex
,
mlr_measures_surv.dcalib
,
mlr_measures_surv.hung_auc
,
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.mae
,
mlr_measures_surv.mse
,
mlr_measures_surv.nagelk_r2
,
mlr_measures_surv.oquigley_r2
,
mlr_measures_surv.rcll
,
mlr_measures_surv.rmse
,
mlr_measures_surv.schmid
,
mlr_measures_surv.song_auc
,
mlr_measures_surv.song_tnr
,
mlr_measures_surv.song_tpr
,
mlr_measures_surv.uno_auc
,
mlr_measures_surv.uno_tnr
,
mlr_measures_surv.uno_tpr
,
mlr_measures_surv.xu_r2
Other Probabilistic survival measures:
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.rcll
,
mlr_measures_surv.schmid
Other distr survival measures:
mlr_measures_surv.calib_alpha
,
mlr_measures_surv.dcalib
,
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.rcll
,
mlr_measures_surv.schmid
Super classes
mlr3::Measure
-> mlr3proba::MeasureSurv
-> MeasureSurvGraf