PipeOpTaskSurvClassifDiscTime
Source:R/PipeOpTaskSurvClassifDiscTime.R
mlr_pipeops_trafotask_survclassif_disctime.Rd
Transform TaskSurv to TaskClassif by dividing continuous
time into multiple time intervals for each observation.
This transformation creates a new target variable disc_status
that indicates
whether an event occurred within each time interval.
This approach facilitates survival analysis within a classification framework
using discrete time intervals (Tutz et al. 2016).
Note that this data transformation is compatible with learners that support
the "validation"
property and can track performance on holdout data during
training, enabling early stopping and logging.
Dictionary
This PipeOp can be instantiated via the
dictionary mlr3pipelines::mlr_pipeops
or with the associated sugar function mlr3pipelines::po()
:
Input and Output Channels
PipeOpTaskSurvClassifDiscTime has one input channel named "input", and two output channels, one named "output" and the other "transformed_data".
During training, the "output" is the "input" TaskSurv transformed to a
TaskClassif.
The target column is named "disc_status"
and indicates whether an event occurred
in each time interval.
An additional numeric feature named "tend"
contains the end time point of each interval.
Lastly, the "output" task has a column with the original observation ids,
under the role "original_ids"
.
The "transformed_data" is an empty data.table.
During prediction, the "input" TaskSurv is transformed to the "output"
TaskClassif with "disc_status"
as target and the "tend"
feature included.
The "transformed_data" is a data.table with columns
the "disc_status"
target of the "output" task, the "id"
(original observation ids),
"obs_times"
(observed times per "id"
) and "tend"
(end time of each interval).
This "transformed_data" is only meant to be used with the PipeOpPredClassifSurvDiscTime.
Parameters
The parameters are
cut :: numeric()
Split points, used to partition the data into intervals based on thetime
column. If unspecified, all unique event times will be used. Ifcut
is a single integer, it will be interpreted as the number of equidistant intervals from 0 until the maximum event time.max_time :: numeric(1)
Ifcut
is unspecified, this will be the last possible event time. All event times aftermax_time
will be administratively censored atmax_time.
Needs to be greater than the minimum event time in the given task.
References
Tutz, Gerhard, Schmid, Matthias (2016). Modeling Discrete Time-to-Event Data, series Springer Series in Statistics. Springer International Publishing. ISBN 978-3-319-28156-8 978-3-319-28158-2, http://link.springer.com/10.1007/978-3-319-28158-2.
Super class
mlr3pipelines::PipeOp
-> PipeOpTaskSurvClassifDiscTime
Methods
Method new()
Creates a new instance of this R6 class.
Usage
PipeOpTaskSurvClassifDiscTime$new(id = "trafotask_survclassif_disctime")
Examples
if (FALSE) { # \dontrun{
library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
task = tsk("lung")
# transform the survival task to a classification task
# all unique event times are used as cutpoints
po_disc = po("trafotask_survclassif_disctime")
task_classif = po_disc$train(list(task))[[1L]]
# the end time points of the discrete time intervals
unique(task_classif$data(cols = "tend"))[[1L]]
# train a classification learner
learner = lrn("classif.log_reg", predict_type = "prob")
learner$train(task_classif)
} # }