This calibration method is defined by calculating $$s = B/n \sum_i (P_i - n/B)^2$$ where \(B\) is number of 'buckets', \(n\) is the number of predictions, and \(P_i\) is the predicted number of deaths in the \(i\)th interval [0, 100/B), [100/B, 50/B),....,[(B - 100)/B, 1).
A model is well-calibrated if s ~ Unif(B)
, tested with chisq.test
(p > 0.05
if well-calibrated).
Model i
is better calibrated than model j
if s_i < s_j
.
Details
This measure can either return the test statistic or the p-value from the chisq.test
.
The former is useful for model comparison whereas the latter is useful for determining if a model
is well-calibration. If chisq = FALSE
and m
is the predicted value then you can manually
compute the p.value with pchisq(m, B - 1, lower.tail = FALSE)
.
NOTE: This measure is still experimental both theoretically and in implementation. Results should therefore only be taken as an indicator of performance and not for conclusive judgements about model calibration.
Dictionary
This Measure can be instantiated via the dictionary mlr_measures or with the associated sugar function msr():
$new()
MeasureSurvDCalibration$get("surv.dcalib")
mlr_measuresmsr("surv.dcalib")
References
Haider, Humza, Hoehn, Bret, Davis, Sarah, Greiner, Russell (2020). “Effective Ways to Build and Evaluate Individual Survival Distributions.” Journal of Machine Learning Research, 21(85), 1--63. https://jmlr.org/papers/v21/18-772.html.
See also
Other survival measures:
mlr_measures_surv.calib_alpha
,
mlr_measures_surv.calib_beta
,
mlr_measures_surv.chambless_auc
,
mlr_measures_surv.cindex
,
mlr_measures_surv.graf
,
mlr_measures_surv.hung_auc
,
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.mae
,
mlr_measures_surv.mse
,
mlr_measures_surv.nagelk_r2
,
mlr_measures_surv.oquigley_r2
,
mlr_measures_surv.rcll
,
mlr_measures_surv.rmse
,
mlr_measures_surv.schmid
,
mlr_measures_surv.song_auc
,
mlr_measures_surv.song_tnr
,
mlr_measures_surv.song_tpr
,
mlr_measures_surv.uno_auc
,
mlr_measures_surv.uno_tnr
,
mlr_measures_surv.uno_tpr
,
mlr_measures_surv.xu_r2
Other calibration survival measures:
mlr_measures_surv.calib_alpha
,
mlr_measures_surv.calib_beta
Other distr survival measures:
mlr_measures_surv.calib_alpha
,
mlr_measures_surv.graf
,
mlr_measures_surv.intlogloss
,
mlr_measures_surv.logloss
,
mlr_measures_surv.rcll
,
mlr_measures_surv.schmid
Super classes
mlr3::Measure
-> mlr3proba::MeasureSurv
-> MeasureSurvDCalibration
Methods
Method new()
Creates a new instance of this R6 class.
Usage
MeasureSurvDCalibration$new()
Arguments
B
(
integer(1)
)
Number of buckets to test for uniform predictions over. Default of10
is recommended by Haider et al. (2020).chisq
(
logical(1)
)
IfTRUE
returns the p.value of the corresponding chisq.test instead of the measure. Otherwise this can be performed manually withpchisq(m, B - 1, lower.tail = FALSE)
.p > 0.05
indicates well-calibrated.