Title: | Clustering Analysis Using Survival Tree and Forest Algorithms |
---|---|
Description: | An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters. |
Authors: | Lu You [aut, cre] (Created the package. Maintains the package.), Lauric Ferrat [aut] (Added functionality. Revised the package. Wrote the vignette.), Hemang Parikh [aut] (Checked and revised the package.), Yanan Huo [aut] (Revised plotting functions of the package.), Jeffrey Krischer [ctb] (Supervisor the medical research. Coauthor of the medical manuscript.), Maria Redondo [ctb] (Principal investigators of the medical research. Coauthor of the medical manuscript.), Richard Oram [ctb] (Coauthor of the medical manuscript.), Andrea Steck [ctb] (Coauthor of the medical manuscript.) |
Maintainer: | Lu You <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2025-02-03 03:54:31 UTC |
Source: | https://github.com/luyouepiusf/survivalclusteringtree |
An outcome-guided algorithm is developed to identify clusters of samples with similar characteristics and survival rate. The algorithm first builds a random forest and then defines distances between samples based on the fitted random forest. Given the distances, we can apply hierarchical clustering algorithms to define clusters.
Index of help topics:
SurvivalClusteringTree-package Clustering Analysis Using Survival Tree and Forest Algorithms plot_survival_tree Visualize the Fitted Survival Tree predict_distance_forest Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe) predict_distance_forest_matrix Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices) predict_distance_tree Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe) predict_distance_tree_matrix Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices) predict_weights Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe) predict_weights_matrix Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices) survival_forest Build a Survival Forest (Data Supplied as a Dataframe) survival_forest_matrix Build a Survival Forest (Data Supplied as Matrices) survival_tree Build a Survival Tree (Data Supplied as a Dataframe) survival_tree_matrix Build a Survival Tree (Data Supplied as Matrices)
Lu You <[email protected]>
Lu You [aut, cre] (Created the package. Maintains the package.), Lauric Ferrat [aut] (Added functionality. Revised the package. Wrote the vignette.), Hemang Parikh [aut] (Checked and revised the package.), Yanan Huo [aut] (Revised plotting functions of the package.), Jeffrey Krischer [ctb] (Supervisor the medical research. Coauthor of the medical manuscript.), Maria Redondo [ctb] (Principal investigators of the medical research. Coauthor of the medical manuscript.), Richard Oram [ctb] (Coauthor of the medical manuscript.), Andrea Steck [ctb] (Coauthor of the medical manuscript.)
Visualize the Fitted Survival Tree
plot_survival_tree(survival_tree, cex = 0.75)
plot_survival_tree(survival_tree, cex = 0.75)
survival_tree |
a fitted survival tree object. |
cex |
numeric character expansion factor. |
The function
predict_distance_forest
predicts distances between samples based on a survival forest fit.
predict_distance_forest( survival_forest, numeric_predictor, factor_predictor, data, missing = "omit" )
predict_distance_forest( survival_forest, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_forest |
a fitted survival forest |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as a Dataframe)
The function
predict_distance_forest_matrix
predicts distances between samples based on a survival forest fit.
predict_distance_forest_matrix( survival_forest, matrix_numeric, matrix_factor, missing = "omit" )
predict_distance_forest_matrix( survival_forest, matrix_numeric, matrix_factor, missing = "omit" )
survival_forest |
a fitted survival forest |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Forest Fit (Data Supplied as Matrices) (Works for raw matrices)
The function
predict_distance_tree
predicts distances between samples based on a survival tree fit.
predict_distance_tree( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
predict_distance_tree( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as a Dataframe)
The function
predict_distance_tree_matrix
predicts distances between samples based on a survival tree fit.
predict_distance_tree_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "omit" )
predict_distance_tree_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "omit" )
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Distances Between Samples Based on a Survival Tree Fit (Data Supplied as Matrices) (Works for raw matrices)
The function
predict_weights
predicts weights of samples in terminal nodes based on a survival tree fit.
predict_weights( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
predict_weights( survival_tree, numeric_predictor, factor_predictor, data, missing = "omit" )
survival_tree |
a fitted survival tree |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
data |
the dataframe (test data) that stores the outcome and predictor variables.
Variables in the global environment will be used if |
missing |
a character value that specifies the handling of missing data.
If |
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as a Dataframe)
The function
predict_weights_matrix
predicts weights of samples in terminal nodes based on a survival tree fit.
predict_weights_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "majority" )
predict_weights_matrix( survival_tree, matrix_numeric, matrix_factor, missing = "majority" )
survival_tree |
a fitted survival tree |
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
missing |
a character value that specifies the handling of missing data.
If |
Predict Weights of Samples in Terminal Nodes Based on a Survival Tree Fit (Data Supplied as Matrices)
The function
survival_forest
build a survival forest given the survival outcomes and predictors of numeric and factor variables.
survival_forest( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", nboot = 100, seed = 0 )
survival_forest( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", nboot = 100, seed = 0 )
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
Build a Survival Forest (Data Supplied as a Dataframe)
The function
survival_forest_matrix
build a survival forest given the survival outcomes and predictors of numeric and factor variables.
survival_forest_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", nboot = 100, seed = 0 )
survival_forest_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate", nboot = 100, seed = 0 )
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
nboot |
an integer value that specifies the number of bootstrap replications. |
seed |
an integer value that specifies the seed. |
Build a Survival Forest (Data Supplied as Matrices)
The function
survival_tree
build a survival tree given the survival outcomes and predictors of numeric and factor variables.
survival_tree( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate" )
survival_tree( survival_outcome, numeric_predictor, factor_predictor, weights = NULL, data, significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate" )
survival_outcome |
a |
numeric_predictor |
a formula specifying the numeric predictors.
As in |
factor_predictor |
a formula specifying the numeric predictors.
As in |
weights |
sample weights, a numeric vector.
|
data |
the dataframe that stores the outcome and predictor variables.
Variables in the global environment will be used if |
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
Build a Survival Tree (Data Supplied as a Dataframe)
The function
survival_tree_matrix
build a survival tree given the survival outcomes and predictors of numeric and factor variables.
survival_tree_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate" )
survival_tree_matrix( time, event, matrix_numeric, matrix_factor, weights = rep(1, length(time)), significance = 0.05, min_weights = 50, missing = "omit", test_type = "univariate" )
time |
survival times, a numeric vector.
|
event |
survival events, a logical vector.
|
matrix_numeric |
numeric predictors, a numeric matrix.
|
matrix_factor |
factor predictors, a character matrix.
|
weights |
sample weights, a numeric vector.
|
significance |
significance threshold, a numeric value.
Stop the splitting algorithm when no splits give a p-value smaller than |
min_weights |
minimum weight threshold, a numeric value.
The weights in a node are greater than |
missing |
a character value that specifies the handling of missing data.
If |
test_type |
a character value that specifies the type of statistical tests.
If |
Build a Survival Tree (Data Supplied as Matrices)