Has anyone been able to implement a conditional logistic?


I’m wondering if anyone has been able to implement a conditional logistic solution in XGBoost (even for 1:m matching) - either through transforming data to fit one of the existing objective functions, or creating a custom objective function?

I understand that there is the rank:pairwise objective function but I wanted to avoid this function for two reasons:

  • It doesn’t allow for cross-validation as it cannot handle having the ‘group’ information being populated

  • The output of the rank:pairwise is open to interpretation - I need to the output to be the conditional probability within a group, whereas am I led to understand the output of this model is just some type of raw score used solely for ranking within a group.

Conditional logistic has similarities to both multi:softmax and survival:cox. From what I understand about multi:softmax though it deals with multiple outcome possibilities for each record, whereas I need one successful case across several records in each group. I keep reading the conditional logistic is kind of like a special case of survival:cox but I am unsure of how to pass the ‘group’ index through to the survival:cox function.

Is it possible somehow to pass the group ID through to a custom objective function? I see that the get.info for ‘group’ doesn’t work for a DMatrix so I am not sure of the most efficient way to do this.

I’d be extremely grateful if anyone has pointers on how best to approach this.

Kind regards,


There is a pull request to expose group information from DMatrix: https://github.com/dmlc/xgboost/pull/3566. Would this be helpful to your case?


Yes I imagine that would be a good place to start - if the group information can be passed into a custom objective function and eval function then it’s possible this could work. Many thanks!


Just wondering do you know if exposing the group information will allow it to be passed through to Cross-Validation and the like? Currently objectives such as rank:pairwise do not work with cross-validation as the group information cannot be passed through


Hi, Is there any progress to attempt to replicate the conditional logistic model in XGBoost?
Kind Regards


Hi All, I am new to XGBoost, but have been able to get gbm3 to do this (clr) in R, via
(as you guessed) cox proportional hazards (coxph subcommand in R/gbm3). I am not sure
that R/gbm3 does what it is supposed to do, but there is a trick which, in theory, should
get it to do the right thing and should work here if the cox model implemented here allows
both left and right-censoring. All you have to do is define the ‘time’ variables to be intervals
which are the same for each group, but non-overlapping from group to group, and you are
done - it should all work. Does survival:cox allow both left and right-censoring? Or has
anyone had any other ideas?
Another approach which worked for various versions of GB in R was to make the group variable
a factor. This seemed to not be efficient computationally, though, and did not work for a large number of groups.
Has there been any progress on this lately?


PS My previous posting - survival:cox in XGBoost appears not to allow (at present) different entry times (left-censoring), which I note is lamented by a number of Biostatisticians online, and by me in this case. Perhaps a future update will include this? It would be a great enhancement.
It seems that categorical input variables are not allowed in XGBoost, but they seem to be a core feature in CatBoost. One could include Group as a categorial variable. Still not as clean as the Survival Analysis solution, but could work.


For the record, I have created a solution using custom objective and error funcitons. I’m not sure if it’s mathematically 100% accurate as I’ve just left the calculation of the Gradient and Hessian as is from vanilla logistic regression. The error function however does use the correct logloss for conditional as it differs slightly from other logistic error logloss.

Also apologies for my poor coding technique. I’m a quant first and a coder very second. I’m sure it’s not the most efficient use of R, so if anyone has tips to make it faster please feel free to amend.

Strata attribute created

#strata identifier is passed to xgbMatrix from data file.  
#Records are in strata-order in the data

attr(dataMat, 'strata')<-data$strata

Custom Conditional Logistic Objective Function

logregobj <- function(preds, dataMat) {

  labels <- getinfo(dataMat, 'label')

  #sum of exponential of raw predictions for each strata.  I use 44 here as each of my strata have 44 records.  
  #If your data has variable records per strata then you will need to calculate differently.

  sumExp <- rep(rowsum(exp(preds), attr(dataMat, "strata")), each=44)

  preds <- exp(preds) / sumExp

  grad <- preds - labels
  hess <- preds * (1 - preds)

  return(list(grad = grad, hess = hess)) 

Custom Conditional Logistic Error Function

evalerror <- function(preds, dataMat) {
  labels <- getinfo(dataMat, "label")
  logSumExp <- log(rowsum(exp(preds), attr(dataMat, "strata")))[,1]
  sumFx <- (rowsum((labels * preds), attr(dataMat, "strata")))[,1]
  ll <- sumFx - logSumExp
  sumLL <- sum(ll)
  err <- -as.numeric(sum(ll))

  return(list(metric = "clogloss", value = err))

So I just call on both these objective and error functions when running xgBoost. It works in both training and cross-validation scenarios.


Hi Brebbles,
Did you see my comment above about gbm3? This allows the cox model (coxph) with left-censoring, which (in theory) should do the right thing if you follow my suggestion (above).
I would be interested to see how your results compare to those you would get if you applied gbm3 the way I suggested. As I said above, I am not sure their extension had been coded correctly, but a comparison would be interesting. BTW, I have a page of theory (comparison of likelihoods) which I would upload here if I could, but I am a newbie and don’t know how…


Hi compleathorseplayer,

I saw your comment above. Apologies for my ignorance but does GBM3 handle classification methods as well? If you can get CLR working via that method then that would be a great place to compare with what I’ve come up with using the custom functions above.


Hi Brebbles, Thank you for your code above - I will try to test it, but I am not sure the logic carries through from logistic to conditional logistic except in the case of linear learner functions (and I worry GBM3 may make similar approximating assumptions as you). In the case of stumps, the strata-centered stumps are not the same as the stumps of the strata-centered variables, which would be a non-issue in the linear case but I think matter quite alot for stumps. Consider this: if a variable is split, there may be some groups for which that variable is entirely on one side or other of the split, in which case such groups’ predictions would be completely unaffected by the split. This shows that the effect of a particular split (or candidate split) will be entirely different from group (stratum) to group. This behaviour is completely different from the logistic regression context, where the predictions would be nudged one of only two possible ways in a split.