I am looking for some help on how to setup my data to use xgboost properly with panel data.
I find it difficult to see how to setup my data in an analogous way to: https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html or https://github.com/dmlc/xgboost/blob/master/demo/aft_survival/aft_survival_demo.py
as every given example is a cross-sectional data set.
Attached is an example of a panel survival data set that has been setup for traditional cox proportional hazard models, as it contains y
(demend) and start and end times which are sought by the survival functions:
km.curve <- Surv(time = df$start_time, time2 = df$end_time, event = df$demend)
# load data
df = pd.read_csv('https://raw.githubusercontent.com/afogarty85/replications/master/Maeda/maeda2.csv')
df[['demend', 'start_time', 'end_time']].head(10)
demend start_time end_time
0 0 0 1
1 0 1 2
2 0 2 3
3 0 3 4
4 0 4 5
5 0 5 6
6 0 6 7
7 0 7 8
8 0 8 9
9 0 9 10
I am wondering how to setup my lower and upper bounds and DMatrix objects. Thanks for your time and consideration!