Survival:cox and panel data setup and xgboost

I am looking for some help on how to setup my data to use xgboost properly with panel data.

I find it difficult to see how to setup my data in an analogous way to: https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html or https://github.com/dmlc/xgboost/blob/master/demo/aft_survival/aft_survival_demo.py
as every given example is a cross-sectional data set.

Attached is an example of a panel survival data set that has been setup for traditional cox proportional hazard models, as it contains y (demend) and start and end times which are sought by the survival functions:

km.curve <- Surv(time = df$start_time, time2 = df$end_time, event = df$demend)
# load data
df = pd.read_csv('https://raw.githubusercontent.com/afogarty85/replications/master/Maeda/maeda2.csv')

df[['demend', 'start_time', 'end_time']].head(10)

demend	start_time	end_time
0	0	0	1
1	0	1	2
2	0	2	3
3	0	3	4
4	0	4	5
5	0	5	6
6	0	6	7
7	0	7	8
8	0	8	9
9	0	9	10

I am wondering how to setup my lower and upper bounds and DMatrix objects. Thanks for your time and consideration!

I’m not aware of any use case of XGBoost for time series (panel) data. XGBoost will optimize a likelihood function over a training data, and the training data is assumed to be i.i.d. Hence the tutorial shows cross-sectional example.

One solution is to use XGBoost to fit the transition function f so that f(x_{i-1}) = x_i where x_i indicates value of covariates and the survival time. (Multi-output regression is not supported, so you’d need to fit multiple models.) Time is to be treated as discrete steps. You will be assuming that the data has somehow reached a steady state.

Thanks for the information!

Generally panel data for survival modeling is synonymous to Time-Varying covariates survival modeling and Time-Varying covariates are dealt different. For more information/idea - check -https://www.jstatsoft.org/article/view/v061c01/v61c01.pdf. I don’t think we can model Time-Varying Covariates/ Panel Data Survival Modeling using current framework.

1 Like

I am wondering if anyone can provide recommendations on how to generate predictions for survival, like the one here: https://github.com/dmlc/xgboost/blob/master/demo/aft_survival/aft_survival_viz_demo.py (it would be great if this could be incorporated into something more general for use)

I am having trouble editing the code to get it to work for a matrix different from the given size:

X = np.array([1, 2, 3, 4, 5]).reshape((-1, 1))

This line in particular seems to be the right metric to generate, but I keep getting length mismatch errors:

acc = np.sum(np.logical_and(y_pred >= y_lower, y_pred <= y_upper)/len(X) * 100)

@xgboost_fan We have a tutorial available: https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html#how-to-use. The length of y_lower and y_upper needs to match the number of rows in the data matrix X.