Customized cox proportional hazard loss function in xgboost

johneinstein · May 26, 2019, 10:40am

I’m trying to model the survival time using the Cox proportional hazard model, i would like to use xgboost to do so. I know xgboost has a coxph loss implementation, however, my loss function is a bit modified, the survival times are grouped in different experimental groups, and im actually interested in the ranking among the groups and the permutation probabilities. Each of the groups can have different sizes and there may be right censored data.

For example, suppose we have the variables, time=[10,3,1,6,4,1,7,30,21,15,25,24], groups=[0,0,0,1,1,1,1,1,2,2,2,2], observed=[1,0,1,1,1,0,1,0,0,1,1,1], where time is the survival times, groups is the group id such that the survival times can be grouped according to it, and observed is the censoring status (1 is observed data, 0 denotes censored data).

I have implemented the model with neural network with Tensorflow, but now I would like to try it in xgboost, I’m not sure what the grad and hess as returned by my loss function are in my case.

I have tried the following code, but it doesnt work, the value of loss function isnt changing:

def custom_loss(y_pred, dataset):

  groups = dataset.get_group()
  label = dataset.get_label()
  
  y_pred =  np.split( y_pred, np.cumsum(groups) )[:-1] 
  label =  np.split( label, np.cumsum(groups) )[:-1] 
  
  grad = np.array([])
  hess = np.array([])
  
  for i in range(len(y_pred)):
    rank = np.argsort(label[i])
    scores = y_pred[i].reshape(-1,1)
    mask = rank.reshape(1,-1)>=rank.reshape(-1,1)
    risk_set = np.sum(mask*np.exp(scores), axis=1)
    
    mask = rank.reshape(1,-1)<rank.reshape(-1,1)
    p = np.exp(scores)/risk_set.reshape(1,-1)
    grad_grouped = 1 - np.sum(mask*p, axis=1)
    
    hess_grouped = np.sum( (-p*(1-p))*mask, axis=1)
      
    grad = np.hstack([grad, grad_grouped])
    hess = np.hstack([hess, hess_grouped])

  return grad, hess
   

def custom_eval(y_pred, dataset):
  groups = dataset.get_group()
  label = dataset.get_label()
  
  y_pred =  np.split( y_pred, np.cumsum(groups) )[:-1] 
  label =  np.split( label, np.cumsum(groups) )[:-1] 
  
  likelihoods = np.array([])

  
  for i in range(len(y_pred)):
    rank = np.argsort(label[i])
    scores = y_pred[i].reshape(-1,)
    mask = rank.reshape(1,-1)>=rank.reshape(-1,1)
    risk_set = np.sum(mask*np.exp(scores), axis=1)
    
    lik = np.log( np.exp(scores)/risk_set )
    likelihoods = np.hstack([likelihoods, lik])
    
  loss = -np.sum(likelihoods)/len(y_pred)

  return 'cox_loss', loss, False

i have also implemented in Tensorflow for the neural network version:

def compute_cox_loss(time, predict, observed, groups):

  '''predict is the score output by the model, 
  other param are the same as in the problem description
  '''

  #ensure the data have the correct shape
  time = tf.reshape(time, (-1,))
  predict = tf.reshape(predict, (-1,))
  observed = tf.reshape(observed, (-1,))

  #split the data into groups
  splitted_time = tf.split(time, groups, axis=0)
  splitted_predict = tf.split(predict, groups, axis=0)
  splitted_observed = tf.split(observed, groups, axis=0)

  #target is the total loss
  target = 0
  count = 1
  batch_size = len(splitted_time)

  #for each group, calculate the CoxPH loss and add it into target
  for i in range(batch_size):

    sorted_time = tf.sort(splitted_time[i], direction='ASCENDING')
    sort_arg = tf.argsort(splitted_time[i], direction='ASCENDING')

    sorted_predict = tf.exp( tf.gather(splitted_predict[i], sort_arg) )
    sorted_observed = tf.gather(splitted_observed[i], sort_arg)

    mask = tf.cast( sorted_observed, tf.bool )      
    L = tf.boolean_mask( tf.log(sorted_predict) - tf.log( tf.cumsum(sorted_predict, reverse=True) ), mask )
    loss = -tf.reduce_mean(L)

    target += loss
    count += 1

  return target/count

Also, I cannot find the code implementing the coxph loss in xgboost in the official github. Can anyone point out where the code is so that i can modify it according to my problem?

hcho3 · May 27, 2019, 6:33pm

Would it be possible to post an example data so that we can try to debug the custom evaluation function? I do acknowledge that we need to improve custom evaluation function for learning-to-rank tasks.

You can also try to modify the pairwise ranking loss in https://github.com/dmlc/xgboost/blob/master/src/objective/rank_obj.cc to implement your loss.

johneinstein · May 31, 2019, 4:40pm

So here is the data set:
https://drive.google.com/file/d/1Z9e3Mb-m4KVZXHJNRMHIeRj2b6sXY358/view?usp=sharing

‘group_id’ is the id of experimental groups, ‘time’ is the event time, ‘ranking’ is the rank ordered by time, ‘observed’ is the censoring variable (1 means observed, 0 is censored), x1 - x5 are the principal components of the features, the whole data set consist of 100 observations.

So the log likelihood of CoxPH model is

$ L = \sum_{i: C_i = 1} L_i = \sum_{i: C_i = 1} (s_i - log( \sum_{ T_j \leq T_i} exp( s_j ) )) $, where C_i is censoring variable, T_i and T_j are the times, s_i is the scores output by the model

For my problem, the times are grouped by group_id instead of comparing across the whole data set.

Actually, the main problem is that when i calculate the derivatives of the likelihood wrt the scores, im not sure whether it should be - \frac{d \sum_{i: C_i = 1} L_i}{d s_i} or - \frac{d L_i}{d s_i} or neither of them.

I have also tried the lambdarank and listwise ranking before but i also want to try CoxPH as i also want the ranking permutation probability apart from the predicted ranking. I also want to compare the performance of these methods.

For the improvements in the custom loss function, i think it is better to include an example of how the derivative wrt each data point is derived. Any help would be appreciated.

PS: btw, how should i display Latex properly in this forum?

hcho3 · September 27, 2021, 1:02am

@johneinstein See my master’s thesis (https://drive.google.com/file/d/0B0c0MbnP6Nn-eUNRRkVOOGpkbFk/view?usp=sharing&resourcekey=0-nVw3WhovKW5FPvPM5GPHfg) on the meaning of “gradient” and “hessian”. Basically, you’d like to compute the partial derivative of the loss function l(y,yhat) with respect to the second argument yhat.

It seems like you want to perform a mix of ranking and survival tasks. I don’t think XGBoost support this out of box. You will need to write a custom loss (objective) function.

I will be on vacation for next 2 weeks, so replies will be delayed from now.

johneinstein · June 17, 2019, 3:22am

Thanks for your reply, im able to optimize my custom loss function now