Survival to Poisson Regression Reduction Pipeline

Wrapper around multiple PipeOps to help in creation of complex survival reduction methods.

Usage

pipeline_survtoregr_pem(
  learner,
  cut = NULL,
  max_time = NULL,
  graph_learner = FALSE
)

Arguments

learner: LearnerRegr
Regression learner to fit the transformed TaskRegr. learner must be able to handle offset and support optimization of a poisson likelihood.
cut: numeric()
Split points, used to partition the data into intervals. If unspecified, all unique event times will be used. If cut is a single integer, it will be interpreted as the number of equidistant intervals from 0 until the maximum event time.
max_time: numeric(1)
If cut is unspecified, this will be the last possible event time. All event times after max_time will be administratively censored at max_time.
graph_learner: logical(1)
If TRUE returns wraps the Graph as a GraphLearner otherwise (default) returns as a Graph.

Value

mlr3pipelines::Graph or mlr3pipelines::GraphLearner

Details

A brief mathematical summary of PEMs (see referenced article for more detail):

PED Transformation: Survival data is converted into piece-wise exponential data (PED) format. Key elements are: Continuous time is divided into $j = 1, \ldots, J$ intervals for each subject, $i = 1, \ldots, n$. A status variable in each entry indicates whether an event or censoring occurred during that interval. For any subject, data entries are created only up until the interval including the event time. An offset column is introduced and represents the logarithm of the time a subject spent in any given interval. For more details, see pammtools::as_ped().
Hazard Estimation with PEM: The PED transformation combined with the working assumption $$\delta_{ij} \stackrel{\text{iid}}{\sim} Poisson \left( \mu_{ij} = \lambda_{ij} t_{ij} \right),$$ where $\delta_{ij}$ denotes the event or censoring indicator, allows framing the problem of piecewise constant hazard estimation as a poisson regression with offset. Specifically, we want to estimate $$\lambda(t \mid \mathbf{x}_i) := exp(g(x_{i},t_{j})), \quad \forall t \in [t_{j-1}, t_{j}), \quad i = 1, \dots, n.$$ $g(x_{i},t_{j})$ is a general function of features $x$ and $t$, i.e. a learner, and may include non-linearity and complex feature interactions. Two important prerequisites of the learner are its capacity to model a poisson likelihood and accommodate the offset.
From Piecewise Hazards to Survival Probabilities: Lastly, the computed hazards are back transformed to survival probabilities via the following identity $$S(t | \mathbf{x}) = \exp \left( - \int_{0}^{t} \lambda(s | \mathbf{x}) \, ds \right) = \exp \left( - \sum_{j = 1}^{J} \lambda(j | \mathbf{x}) d_j\, \right),$$ where $d_j$ specifies the duration of interval $j$.

The previous considerations are reflected in the pipeline which consists of the following steps:

PipeOpTaskSurvRegrPEM Converts TaskSurv to a TaskRegr.
A LearnerRegr is fit and predicted on the new TaskRegr.
PipeOpPredRegrSurvPEM transforms the resulting PredictionRegr to PredictionSurv.

References

Bender, Andreas, Groll, Andreas, Scheipl, Fabian (2018). “A generalized additive model approach to time-to-event analysis.” Statistical Modelling, 18(3-4), 299–321. https://doi.org/10.1177/1471082X17748083.

Examples

if (FALSE) { # \dontrun{
  library(mlr3)
  library(mlr3learners)
  library(mlr3pipelines)

  task = tsk("lung")
  part = partition(task)

  # typically model formula and features types are extracted from the task
  learner = lrn("regr.gam", family = "poisson")
  grlrn = ppl(
   "survtoregr_pem",
    learner = learner,
    graph_learner = TRUE
  )
  grlrn$train(task, row_ids = part$train)
  grlrn$predict(task, row_ids = part$test)

  # In some instances special formulas can be specified in the learner
  learner = lrn("regr.gam", family = "poisson", formula = pem_status ~ s(tend) + s(age) + meal.cal)
  grlrn = ppl(
   "survtoregr_pem",
    learner = learner,
    graph_learner = TRUE
  )
  grlrn$train(task, row_ids = part$train)
  grlrn$predict(task, row_ids = part$test)

  # if necessary encode data before passing to learner with e.g. po("encode"),
  # po("modelmatrix"), etc.
  # With po("modelmatrix") feature types and formula can be adjusted at the same time
  cut = round(seq(0, max(task$data()$time), length.out = 20))
  learner = as_learner(
    po("modelmatrix", formula = ~ as.factor(tend) + .) %>>%
    lrn("regr.glmnet", family = "poisson", lambda = 0)
  )
  grlrn = ppl(
    "survtoregr_pem",
    learner = learner,
    cut = cut,
    graph_learner = TRUE
  )
  grlrn$train(task, row_ids = part$train)
  grlrn$predict(task, row_ids = part$test)

  # xgboost regression learner
  learner = as_learner(
    po("modelmatrix", formula = ~ .) %>>%
    lrn("regr.xgboost", objective = "count:poisson", nrounds = 100, eta = 0.1)
  )

  grlrn = ppl(
    "survtoregr_pem",
    learner = learner,
    graph_learner = TRUE
  )
  grlrn$train(task, row_ids = part$train)
  grlrn$predict(task, row_ids = part$test)
} # }