Skip to contents

Helper function to compose a survival distribution (or cumulative hazard) from the relative risk predictions (linear predictors, lp) of a proportional hazards model (e.g. a Cox-type model).

Usage

breslow(times, status, lp_train, lp_test, eval_times = NULL, type = "surv")

Arguments

times

(numeric())
Vector of times (train set).

status

(numeric())
Vector of status indicators (train set). For each observation in the train set, this should be 0 (alive/censored) or 1 (dead).

lp_train

(numeric())
Vector of linear predictors (train set). These are the relative score predictions (\(lp = \hat{\beta}X_{train}\)) from a proportional hazards model on the train set.

lp_test

(numeric())
Vector of linear predictors (test set). These are the relative score predictions (\(lp = \hat{\beta}X_{test}\)) from a proportional hazards model on the test set.

eval_times

(numeric())
Vector of times to compute survival probabilities. If NULL (default), the unique and sorted times from the train set will be used, otherwise the unique and sorted eval_times.

type

(character())
Type of prediction estimates. Default is surv which returns the survival probabilities \(S_i(t)\) for each test observation \(i\). If cumhaz, the function returns the estimated cumulative hazards \(H_i(t)\).

Value

a matrix (obs x times). Number of columns is equal to eval_times and number of rows is equal to the number of test observations (i.e. the length of the lp_test vector). Depending on the type argument, the matrix can have either survival probabilities (0-1) or cumulative hazard estimates (0-Inf).

Details

We estimate the survival probability of individual \(i\) (from the test set), at time point \(t\) as follows: $$S_i(t) = e^{-H_i(t)} = e^{-\hat{H}_0(t) \times e^{lp_i}}$$

where:

  • \(H_i(t)\) is the cumulative hazard function for individual \(i\)

  • \(\hat{H}_0(t)\) is Breslow's estimator for the cumulative baseline hazard. Estimation requires the training set's times and status as well the risk predictions (lp_train).

  • \(lp_i\) is the risk prediction (linear predictor) of individual \(i\) on the test set.

Breslow's approach uses a non-parametric maximum likelihood estimation of the cumulative baseline hazard function:

$$\hat{H}_0(t) = \sum_{i=1}^n{\frac{I(T_i \le t)\delta_i} {\sum\nolimits_{j \in R_i}e^{lp_j}}}$$

where:

  • \(t\) is the vector of time points (unique and sorted, from the train set)

  • \(n\) is number of events (train set)

  • \(T\) is the vector of event times (train set)

  • \(\delta\) is the status indicator (1 = event or 0 = censored)

  • \(R_i\) is the risk set (number of individuals at risk just before event \(i\))

  • \(lp_j\) is the risk prediction (linear predictor) of individual \(j\) (who is part of the risk set \(R_i\)) on the train set.

We employ constant interpolation to estimate the cumulative baseline hazards, extending from the observed unique event times to the specified evaluation times (eval_times). Any values falling outside the range of the estimated times are assigned as follows: $$\hat{H}_0(eval\_times < min(t)) = 0$$ and $$\hat{H}_0(eval\_times > max(t)) = \hat{H}_0(max(t))$$

Note that in the rare event of lp predictions being Inf or -Inf, the resulting cumulative hazard values become NaN, which we substitute with Inf (and corresponding survival probabilities take the value of \(0\)).

For similar implementations, see gbm::basehaz.gbm(), C060::basesurv() and xgboost.surv::sgb_bhaz().

References

Breslow N (1972). “Discussion of 'Regression Models and Life-Tables' by D.R. Cox.” Journal of the Royal Statistical Society: Series B, 34(2), 216-217.

Lin, Y. D (2007). “On the Breslow estimator.” Lifetime Data Analysis, 13(4), 471-480. doi:10.1007/s10985-007-9048-y .

Examples

task = tsk("rats")
part = partition(task, ratio = 0.8)

learner = lrn("surv.coxph")
learner$train(task, part$train)
p_train = learner$predict(task, part$train)
p_test = learner$predict(task, part$test)

surv = breslow(times = task$times(part$train), status = task$status(part$train),
               lp_train = p_train$lp, lp_test = p_test$lp)
head(surv)
#>      32        39        40        45        49        50        51        53
#> [1,]  1 0.9966693 0.9933135 0.9899512 0.9899512 0.9864522 0.9864522 0.9864522
#> [2,]  1 0.9966000 0.9931747 0.9897430 0.9897430 0.9861720 0.9861720 0.9861720
#> [3,]  1 0.9965293 0.9930331 0.9895305 0.9895305 0.9858861 0.9858861 0.9858861
#> [4,]  1 0.9997487 0.9994948 0.9992396 0.9992396 0.9989731 0.9989731 0.9989731
#> [5,]  1 0.9909026 0.9817902 0.9727136 0.9727136 0.9633246 0.9633246 0.9633246
#> [6,]  1 0.9962316 0.9924366 0.9886360 0.9886360 0.9846825 0.9846825 0.9846825
#>             54        55        61        62        63        64        65
#> [1,] 0.9829460 0.9794278 0.9794278 0.9794278 0.9794278 0.9794278 0.9794278
#> [2,] 0.9825939 0.9790039 0.9790039 0.9790039 0.9790039 0.9790039 0.9790039
#> [3,] 0.9822347 0.9785714 0.9785714 0.9785714 0.9785714 0.9785714 0.9785714
#> [4,] 0.9987052 0.9984356 0.9984356 0.9984356 0.9984356 0.9984356 0.9984356
#> [5,] 0.9539744 0.9446504 0.9446504 0.9446504 0.9446504 0.9446504 0.9446504
#> [6,] 0.9807228 0.9767515 0.9767515 0.9767515 0.9767515 0.9767515 0.9767515
#>             66        67        69        70        71        72        73
#> [1,] 0.9758931 0.9723413 0.9723413 0.9687126 0.9687126 0.9650564 0.9575886
#> [2,] 0.9753973 0.9717735 0.9717735 0.9680716 0.9680716 0.9643419 0.9567248
#> [3,] 0.9748915 0.9711943 0.9711943 0.9674176 0.9674176 0.9636130 0.9558438
#> [4,] 0.9981637 0.9978896 0.9978896 0.9976086 0.9976086 0.9973245 0.9967412
#> [5,] 0.9353409 0.9260454 0.9260454 0.9166093 0.9166093 0.9071639 0.8880637
#> [6,] 0.9727634 0.9687579 0.9687579 0.9646677 0.9646677 0.9605485 0.9521414
#>             74        75        76        77        78        79        80
#> [1,] 0.9575886 0.9536810 0.9536810 0.9496213 0.9496213 0.9454122 0.9454122
#> [2,] 0.9567248 0.9527396 0.9527396 0.9485997 0.9485997 0.9443077 0.9443077
#> [3,] 0.9558438 0.9517795 0.9517795 0.9475578 0.9475578 0.9431815 0.9431815
#> [4,] 0.9967412 0.9964342 0.9964342 0.9961141 0.9961141 0.9957809 0.9957809
#> [5,] 0.8880637 0.8781721 0.8781721 0.8679700 0.8679700 0.8574719 0.8574719
#> [6,] 0.9521414 0.9477457 0.9477457 0.9431816 0.9431816 0.9384520 0.9384520
#>             81        82        83        84        85        86        87
#> [1,] 0.9367216 0.9367216 0.9367216 0.9320879 0.9320879 0.9273429 0.9273429
#> [2,] 0.9354474 0.9354474 0.9354474 0.9307238 0.9307238 0.9258873 0.9258873
#> [3,] 0.9341483 0.9341483 0.9341483 0.9293333 0.9293333 0.9244038 0.9244038
#> [4,] 0.9950885 0.9950885 0.9950885 0.9947168 0.9947168 0.9943345 0.9943345
#> [5,] 0.8360528 0.8360528 0.8360528 0.8247724 0.8247724 0.8133220 0.8133220
#> [6,] 0.9286958 0.9286958 0.9286958 0.9234987 0.9234987 0.9181804 0.9181804
#>             88        89        91        92        93        94        95
#> [1,] 0.9273429 0.9170220 0.9170220 0.9170220 0.9170220 0.9109805 0.9109805
#> [2,] 0.9258873 0.9153693 0.9153693 0.9153693 0.9153693 0.9092135 0.9092135
#> [3,] 0.9244038 0.9136852 0.9136852 0.9136852 0.9136852 0.9074132 0.9074132
#> [4,] 0.9943345 0.9934967 0.9934967 0.9934967 0.9934967 0.9930022 0.9930022
#> [5,] 0.8133220 0.7887658 0.7887658 0.7887658 0.7887658 0.7746125 0.7746125
#> [6,] 0.9181804 0.9066248 0.9066248 0.9066248 0.9066248 0.8998685 0.8998685
#>             96        97        98        99       100       101       102
#> [1,] 0.8980142 0.8980142 0.8980142 0.8980142 0.8980142 0.8980142 0.8842046
#> [2,] 0.8960048 0.8960048 0.8960048 0.8960048 0.8960048 0.8960048 0.8819414
#> [3,] 0.8939582 0.8939582 0.8939582 0.8939582 0.8939582 0.8939582 0.8796371
#> [4,] 0.9919305 0.9919305 0.9919305 0.9919305 0.9919305 0.9919305 0.9907734
#> [5,] 0.7447836 0.7447836 0.7447836 0.7447836 0.7447836 0.7447836 0.7138282
#> [6,] 0.8853879 0.8853879 0.8853879 0.8853879 0.8853879 0.8853879 0.8699959
#>            103       104
#> [1,] 0.8695545 0.8614645
#> [2,] 0.8670272 0.8587935
#> [3,] 0.8644548 0.8560753
#> [4,] 0.9895274 0.9888309
#> [5,] 0.6818954 0.6646574
#> [6,] 0.8537017 0.8447191