Survival probabilities using Breslow's estimator

Helper function to compose a survival distribution (or cumulative hazard) from the relative risk predictions (linear predictors, lp) of a proportional hazards model (e.g. a Cox-type model).

Usage

breslow(times, status, lp_train, lp_test, eval_times = NULL, type = "surv")

Arguments

times: (numeric())
Vector of times (train set).
status: (numeric())
Vector of status indicators (train set). For each observation in the train set, this should be 0 (alive/censored) or 1 (dead).
lp_train: (numeric())
Vector of linear predictors (train set). These are the relative score predictions ($lp = \hat{\beta}X_{train}$) from a proportional hazards model on the train set.
lp_test: (numeric())
Vector of linear predictors (test set). These are the relative score predictions ($lp = \hat{\beta}X_{test}$) from a proportional hazards model on the test set.
eval_times: (numeric())
Vector of times to compute survival probabilities. If NULL (default), the unique and sorted times from the train set will be used, otherwise the unique and sorted eval_times.
type: (character())
Type of prediction estimates. Default is surv which returns the survival probabilities $S_i(t)$ for each test observation $i$. If cumhaz, the function returns the estimated cumulative hazards $H_i(t)$.

Value

a matrix (obs x times). Number of columns is equal to eval_times

and number of rows is equal to the number of test observations (i.e. the length of the lp_test vector). Depending on the type argument, the matrix can have either survival probabilities (0-1) or cumulative hazard estimates (0-Inf).

Details

We estimate the survival probability of individual $i$ (from the test set), at time point $t$ as follows: $$S_i(t) = e^{-H_i(t)} = e^{-\hat{H}_0(t) \times e^{lp_i}}$$

where:

$H_i(t)$ is the cumulative hazard function for individual $i$
$\hat{H}_0(t)$ is Breslow's estimator for the cumulative baseline hazard. Estimation requires the training set's times and status as well the risk predictions (lp_train).
$lp_i$ is the risk prediction (linear predictor) of individual $i$ on the test set.

Breslow's approach uses a non-parametric maximum likelihood estimation of the cumulative baseline hazard function:

$$\hat{H}_0(t) = \sum_{i=1}^n{\frac{I(T_i \le t)\delta_i} {\sum\nolimits_{j \in R_i}e^{lp_j}}}$$

where:

$t$ is the vector of time points (unique and sorted, from the train set)
$n$ is number of events (train set)
$T$ is the vector of event times (train set)
$\delta$ is the status indicator (1 = event or 0 = censored)
$R_i$ is the risk set (number of individuals at risk just before event $i$)
$lp_j$ is the risk prediction (linear predictor) of individual $j$ (who is part of the risk set $R_i$) on the train set.

We employ constant interpolation to estimate the cumulative baseline hazards, extending from the observed unique event times to the specified evaluation times (eval_times). Any values falling outside the range of the estimated times are assigned as follows: $$\hat{H}_0(eval\_times < min(t)) = 0$$ and $$\hat{H}_0(eval\_times > max(t)) = \hat{H}_0(max(t))$$

Note that in the rare event of lp predictions being Inf or -Inf, the resulting cumulative hazard values become NaN, which we substitute with Inf (and corresponding survival probabilities take the value of $0$).

For similar implementations, see gbm::basehaz.gbm(), C060::basesurv() and xgboost.surv::sgb_bhaz().

References

Cox DR (1972). “Regression Models and Life-Tables.” Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187--202. doi:10.1111/j.2517-6161.1972.tb00899.x .

Lin, Y. D (2007). “On the Breslow estimator.” Lifetime Data Analysis, 13(4), 471-480. doi:10.1007/S10985-007-9048-Y/METRICS .

Examples

task = tsk("rats")
part = partition(task, ratio = 0.8)

learner = lrn("surv.coxph")
learner$train(task, part$train)
p_train = learner$predict(task, part$train)
p_test  = learner$predict(task, part$test)

surv = breslow(times = task$times(part$train), status = task$status(part$train),
               lp_train = p_train$lp, lp_test = p_test$lp)
head(surv)
#>      23        34        39        40        45        49        51        53
#> [1,]  1 0.9909036 0.9817658 0.9726135 0.9634920 0.9542693 0.9542693 0.9542693
#> [2,]  1 0.9906404 0.9812407 0.9718286 0.9624508 0.9529715 0.9529715 0.9529715
#> [3,]  1 0.9955515 0.9910616 0.9865431 0.9820181 0.9774206 0.9774206 0.9774206
#> [4,]  1 0.9995310 0.9990558 0.9985755 0.9980926 0.9975999 0.9975999 0.9975999
#> [5,]  1 0.9954225 0.9908030 0.9861546 0.9815002 0.9767718 0.9767718 0.9767718
#> [6,]  1 0.9994962 0.9989857 0.9984698 0.9979511 0.9974219 0.9974219 0.9974219
#>             54        55        61        62        63        64        65
#> [1,] 0.9449191 0.9355920 0.9355920 0.9355920 0.9355920 0.9262214 0.9262214
#> [2,] 0.9433639 0.9337828 0.9337828 0.9337828 0.9337828 0.9241598 0.9241598
#> [3,] 0.9727361 0.9680396 0.9680396 0.9680396 0.9680396 0.9632969 0.9632969
#> [4,] 0.9970958 0.9965882 0.9965882 0.9965882 0.9965882 0.9960733 0.9960733
#> [5,] 0.9719547 0.9671258 0.9671258 0.9671258 0.9671258 0.9622502 0.9622502
#> [6,] 0.9968804 0.9963352 0.9963352 0.9963352 0.9963352 0.9957823 0.9957823
#>             66        67        68        69        70        71        72
#> [1,] 0.9168704 0.9168704 0.9075182 0.9075182 0.8980900 0.8885952 0.8791947
#> [2,] 0.9145597 0.9145597 0.9049613 0.9049613 0.8952878 0.8855489 0.8759098
#> [3,] 0.9585396 0.9585396 0.9537568 0.9537568 0.9489095 0.9440015 0.9391158
#> [4,] 0.9955546 0.9955546 0.9950308 0.9950308 0.9944975 0.9939550 0.9934125
#> [5,] 0.9573602 0.9573602 0.9524447 0.9524447 0.9474637 0.9424212 0.9374022
#> [6,] 0.9952252 0.9952252 0.9946627 0.9946627 0.9940900 0.9935075 0.9929249
#>             73        74        75        76        77        78        79
#> [1,] 0.8697183 0.8697183 0.8599592 0.8599592 0.8498406 0.8396009 0.8292044
#> [2,] 0.8661958 0.8661958 0.8561953 0.8561953 0.8458298 0.8353440 0.8247014
#> [3,] 0.9341634 0.9341634 0.9290344 0.9290344 0.9236848 0.9182379 0.9126726
#> [4,] 0.9928600 0.9928600 0.9922850 0.9922850 0.9916823 0.9910654 0.9904317
#> [5,] 0.9323155 0.9323155 0.9270482 0.9270482 0.9215553 0.9159634 0.9102511
#> [6,] 0.9923317 0.9923317 0.9917143 0.9917143 0.9910672 0.9904049 0.9897246
#>             80        81        82        83        84        85        86
#> [1,] 0.8079698 0.7970856 0.7970856 0.7970856 0.7749746 0.7749746 0.7749746
#> [2,] 0.8029762 0.7918470 0.7918470 0.7918470 0.7692519 0.7692519 0.7692519
#> [3,] 0.9011937 0.8952500 0.8952500 0.8952500 0.8830462 0.8830462 0.8830462
#> [4,] 0.9891136 0.9884251 0.9884251 0.9884251 0.9869987 0.9869987 0.9869987
#> [5,] 0.8984719 0.8923744 0.8923744 0.8923744 0.8798586 0.8798586 0.8798586
#> [6,] 0.9883097 0.9875707 0.9875707 0.9875707 0.9860398 0.9860398 0.9860398
#>             87        88        89        90        91        92        93
#> [1,] 0.7749746 0.7635446 0.7401226 0.7401226 0.7401226 0.7269991 0.7269991
#> [2,] 0.7692519 0.7575789 0.7336750 0.7336750 0.7336750 0.7202911 0.7202911
#> [3,] 0.8830462 0.8766676 0.8634423 0.8634423 0.8634423 0.8559383 0.8559383
#> [4,] 0.9869987 0.9862462 0.9846701 0.9846701 0.9846701 0.9837662 0.9837662
#> [5,] 0.8798586 0.8733190 0.8597641 0.8597641 0.8597641 0.8520758 0.8520758
#> [6,] 0.9860398 0.9852321 0.9835408 0.9835408 0.9835408 0.9825709 0.9825709
#>             94        95        96        98        99       101       102
#> [1,] 0.7269991 0.7269991 0.6988066 0.6988066 0.6988066 0.6840668 0.6690662
#> [2,] 0.7202911 0.7202911 0.6915630 0.6915630 0.6915630 0.6765565 0.6612941
#> [3,] 0.8559383 0.8559383 0.8395795 0.8395795 0.8395795 0.8308921 0.8219520
#> [4,] 0.9837662 0.9837662 0.9817708 0.9817708 0.9817708 0.9806970 0.9795813
#> [5,] 0.8520758 0.8520758 0.8353221 0.8353221 0.8353221 0.8264288 0.8172795
#> [6,] 0.9825709 0.9825709 0.9804301 0.9804301 0.9804301 0.9792781 0.9780814
#>            103       104
#> [1,] 0.6386860 0.6230380
#> [2,] 0.6304145 0.6145258
#> [3,] 0.8035257 0.7938596
#> [4,] 0.9772473 0.9760037
#> [5,] 0.7984316 0.7885492
#> [6,] 0.9755780 0.9742444