Transform TaskSurv to TaskRegr by dividing continuous
time into multiple time intervals for each observation.
This transformation creates a new target variable pem_status
that indicates
whether an event occurred within each time interval.
Dictionary
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po()
:
Input and Output Channels
PipeOpTaskSurvRegrPEM has one input channel named "input", and two output channels, one named "output" and the other "transformed_data".
During training, the "output" is the "input" TaskSurv transformed to a
TaskRegr.
The target column is named "pem_status"
and indicates whether an event occurred
in each time interval.
An additional numeric feature named "tend"
contains the end time point of each interval.
Lastly, the "output" task has an offset column "offset"
.
The offset, also referred to as exposure, is the logarithm of time spent in interval \(j\), i.e. \(log(t_j)\).
The "transformed_data" is an empty data.table.
During prediction, the "input" TaskSurv is transformed to the "output"
TaskRegr with "pem_status"
as target, "tend"
included as feature and
and the "offset"
column which is assigned the offset "col_role"
.
The "transformed_data" is a data.table with columns the "pem_status"
target of the "output" task, the "id"
(original observation ids),
"obs_times"
(observed times per "id"
) and "tend"
(end time of each interval).
This "transformed_data" is only meant to be used with the PipeOpPredRegrSurvPEM.
Parameters
The parameters are
cut :: numeric()
Split points, used to partition the data into intervals based on thetime
column. If unspecified, all unique event times will be used. Ifcut
is a single integer, it will be interpreted as the number of equidistant intervals from 0 until the maximum event time.max_time :: numeric(1)
Ifcut
is unspecified, this will be the last possible event time. All event times aftermax_time
will be administratively censored atmax_time.
Needs to be greater than the minimum event time in the given task.
References
Bender, Andreas, Groll, Andreas, Scheipl, Fabian (2018). “A generalized additive model approach to time-to-event analysis.” Statistical Modelling, 18(3-4), 299–321. https://doi.org/10.1177/1471082X17748083.
See also
Other PipeOps:
mlr_pipeops_survavg
,
mlr_pipeops_trafopred_regrsurv_pem
Other Transformation PipeOps:
mlr_pipeops_trafopred_classifsurv_IPCW
,
mlr_pipeops_trafopred_classifsurv_disctime
,
mlr_pipeops_trafopred_regrsurv_pem
,
mlr_pipeops_trafotask_survclassif_IPCW
,
mlr_pipeops_trafotask_survclassif_disctime
Super class
mlr3pipelines::PipeOp
-> PipeOpTaskSurvRegrPEM
Methods
Method new()
Creates a new instance of this R6 class.
Usage
PipeOpTaskSurvRegrPEM$new(id = "trafotask_survregr_pem")
Examples
if (FALSE) { # (mlr3misc::require_namespaces(c("mlr3pipelines", "mlr3extralearners"), quietly = TRUE))
if (FALSE) { # \dontrun{
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
task = tsk("lung")
# transform the survival task to a regression task
# all unique event times are used as cutpoints
po_pem = po("trafotask_survregr_pem")
task_regr = po_pem$train(list(task))[[1L]]
# the end time points of the discrete time intervals
unique(task_regr$data(cols = "tend")[[1L]])
# train a regression learner that supports poisson regression
# e.g. regr.gam
# won't run unless learner can accept offset column role
learner = lrn("regr.gam", formula = pem_status ~ s(age) + s(tend), family = "poisson")
learner$train(task_regr)
# e.g. regr.xgboost, note prior data processing steps
learner = as_learner(
po("modelmatrix", formula = ~ as.factor(tend) + .) %>>%
lrn("regr.xgboost", objective = "count:poisson", nrounds = 100, eta = 0.1)
)
learner$train(task_regr)
} # }
}