Probabilistic Supervised Learning for mlr3.
mlr3proba is a probabilistic supervised learning (PSL) toolkit for machine learning in R utilising the mlr3 package. Probabilistic supervised learning is a field of supervised machine learning in which probability distributions are predicted. Regression and classification tasks can be shown to be sub-fields of PSL, for example reducing a predicted probability distribution from probabilistic regression by taking its mean gives a regression prediction. PSL is therefore a powerful tool to provide more information about predictions than the more classical regression or classification. Probably the most known variant of PSL is survival analysis, where the task of interest is to predict an individual’s survival curve. Other forms of PSL include density estimation and probabilistic regression. To-date, PSL toolkits in R have been limited to Bayesian simulation packages, but mlr3proba hopes to change this by allowing domain-agnostic (Bayesian or Frequentist) fit/predict and evaluation workflows.
Install the last release from CRAN:
Install the development version from GitHub:
|surv.blackboost||Gradient Boosting with Regression Trees||mboost|
|surv.coxph||Cox Proportional Hazards||survival|
|surv.cvglmnet||Cross-Validated GLM with Elastic Net Regularization||glmnet|
|surv.flexible||Flexible Parametric Spline Models||flexsurv|
|surv.gamboost||Gradient Boosting for Additive Models||mboost|
|surv.gbm||Generalized Boosting Regression Modeling||gbm|
|surv.glmboost||Gradient Boosting with Component-wise Linear Models||mboost|
|surv.glmnet||GLM with Elastic Net Regularization||glmnet|
|surv.mboost||Gradient Boosting for Generalized Additive Models||mboost|
|surv.parametric||Fully Parametric Survival Models||survival|
|surv.penalized||L1 and L2 Penalized Estimation in GLMs||penalized|
|surv.randomForestSRC||RandomForestSRC Survival Forest||randomForestSRC|
|surv.ranger||Ranger Survival Forest||ranger|
|surv.rpart||Rpart Survival Forest||rpart|
|surv.svm||Regression, Ranking and Hybrid Support Vector Machines||survivalsvm|
|surv.chamblessAUC||Chambless and Diao’s AUC||survAUC|
|surv.gonenC||Gonen and Heller’s C-Index||survAUC|
|surv.graf||Integrated Graf Score||mlr3proba|
|surv.grafSE||Standard Error of Integrated Graf Score||mlr3proba|
|surv.hungAUC||Hung and Chiang’s AUC||survAUC|
|surv.intlogloss||Integrated Log Loss||mlr3proba|
|surv.intloglossSE||Standard Error of Integrated Log Loss||mlr3proba|
|surv.loglossSE||Standard Error of Log Loss||mlr3proba|
|surv.oquigleyR2||O’Quigley, Xu, and Stare’s R2||survAUC|
|surv.songAUC||Song and Zhou’s AUC||survAUC|
|surv.songTNR||Song and Zhou’s TNR||survAUC|
|surv.songTPR||Song and Zhou’s TPR||survAUC|
|surv.xuR2||Xu and O’Quigley’s R2||survAUC|
The vision of mlr3proba is to be the first complete probabilistic machine learning package in R. This encompasses survival analysis, probabilistic regression, and unsupervised density estimation. The first release of mlr3proba is focused entirely on survival analysis and introduces
TaskSurv. Later releases will include
TaskDensity and will extend
TaskRegr to have probabilistic predict types. The lifecycle of the survival task and features are considered
maturing and any major changes are unlikely. The density and probabilistic regression tasks are currently in the early stages of development. The current main features of mlr3proba are:
PredictionSurvfor survival analysis
LearnerDensity, and associated learners/measures
probpredict type to
TaskRegr, and associated learners/measures
MeasureSurvto return measures at multiple time-points simultaneously
mlr3proba is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!
In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).
Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.
A predecessor to this package is mlr, using the survival task. Several packages exist for pure Bayesian probabilistic modelling, including jags and stan. For implementation of a few survival models and measures, the largest package is survival. There does not appear to be a package that implements many different variants of density estimation, but see this list for the biggest density estimation packages in R.