Has anyone been able to implement a conditional logistic?

I’m wondering if anyone has been able to implement a conditional logistic solution in XGBoost (even for 1:m matching) - either through transforming data to fit one of the existing objective functions, or creating a custom objective function?

I understand that there is the rank:pairwise objective function but I wanted to avoid this function for two reasons:

  • It doesn’t allow for cross-validation as it cannot handle having the ‘group’ information being populated

  • The output of the rank:pairwise is open to interpretation - I need to the output to be the conditional probability within a group, whereas am I led to understand the output of this model is just some type of raw score used solely for ranking within a group.

Conditional logistic has similarities to both multi:softmax and survival:cox. From what I understand about multi:softmax though it deals with multiple outcome possibilities for each record, whereas I need one successful case across several records in each group. I keep reading the conditional logistic is kind of like a special case of survival:cox but I am unsure of how to pass the ‘group’ index through to the survival:cox function.

Is it possible somehow to pass the group ID through to a custom objective function? I see that the get.info for ‘group’ doesn’t work for a DMatrix so I am not sure of the most efficient way to do this.

I’d be extremely grateful if anyone has pointers on how best to approach this.

Kind regards,

There is a pull request to expose group information from DMatrix: https://github.com/dmlc/xgboost/pull/3566. Would this be helpful to your case?

Yes I imagine that would be a good place to start - if the group information can be passed into a custom objective function and eval function then it’s possible this could work. Many thanks!

Just wondering do you know if exposing the group information will allow it to be passed through to Cross-Validation and the like? Currently objectives such as rank:pairwise do not work with cross-validation as the group information cannot be passed through

Hi, Is there any progress to attempt to replicate the conditional logistic model in XGBoost?
Kind Regards

Hi All, I am new to XGBoost, but have been able to get gbm3 to do this (clr) in R, via
(as you guessed) cox proportional hazards (coxph subcommand in R/gbm3). I am not sure
that R/gbm3 does what it is supposed to do, but there is a trick which, in theory, should
get it to do the right thing and should work here if the cox model implemented here allows
both left and right-censoring. All you have to do is define the ‘time’ variables to be intervals
which are the same for each group, but non-overlapping from group to group, and you are
done - it should all work. Does survival:cox allow both left and right-censoring? Or has
anyone had any other ideas?
Another approach which worked for various versions of GB in R was to make the group variable
a factor. This seemed to not be efficient computationally, though, and did not work for a large number of groups.
Has there been any progress on this lately?

PS My previous posting - survival:cox in XGBoost appears not to allow (at present) different entry times (left-censoring), which I note is lamented by a number of Biostatisticians online, and by me in this case. Perhaps a future update will include this? It would be a great enhancement.
It seems that categorical input variables are not allowed in XGBoost, but they seem to be a core feature in CatBoost. One could include Group as a categorial variable. Still not as clean as the Survival Analysis solution, but could work.

For the record, I have created a solution using custom objective and error funcitons. I’m not sure if it’s mathematically 100% accurate as I’ve just left the calculation of the Gradient and Hessian as is from vanilla logistic regression. The error function however does use the correct logloss for conditional as it differs slightly from other logistic error logloss.

Also apologies for my poor coding technique. I’m a quant first and a coder very second. I’m sure it’s not the most efficient use of R, so if anyone has tips to make it faster please feel free to amend.

Strata attribute created

#strata identifier is passed to xgbMatrix from data file.  
#Records are in strata-order in the data

attr(dataMat, 'strata')<-data$strata

Custom Conditional Logistic Objective Function

logregobj <- function(preds, dataMat) {

  labels <- getinfo(dataMat, 'label')

  #sum of exponential of raw predictions for each strata.  I use 44 here as each of my strata have 44 records.  
  #If your data has variable records per strata then you will need to calculate differently.

  sumExp <- rep(rowsum(exp(preds), attr(dataMat, "strata")), each=44)

  preds <- exp(preds) / sumExp

  grad <- preds - labels
  hess <- preds * (1 - preds)

  return(list(grad = grad, hess = hess)) 

Custom Conditional Logistic Error Function

evalerror <- function(preds, dataMat) {
  labels <- getinfo(dataMat, "label")
  logSumExp <- log(rowsum(exp(preds), attr(dataMat, "strata")))[,1]
  sumFx <- (rowsum((labels * preds), attr(dataMat, "strata")))[,1]
  ll <- sumFx - logSumExp
  sumLL <- sum(ll)
  err <- -as.numeric(sum(ll))

  return(list(metric = "clogloss", value = err))

So I just call on both these objective and error functions when running xgBoost. It works in both training and cross-validation scenarios.

Hi Brebbles,
Did you see my comment above about gbm3? This allows the cox model (coxph) with left-censoring, which (in theory) should do the right thing if you follow my suggestion (above).
I would be interested to see how your results compare to those you would get if you applied gbm3 the way I suggested. As I said above, I am not sure their extension had been coded correctly, but a comparison would be interesting. BTW, I have a page of theory (comparison of likelihoods) which I would upload here if I could, but I am a newbie and don’t know how…

Hi compleathorseplayer,

I saw your comment above. Apologies for my ignorance but does GBM3 handle classification methods as well? If you can get CLR working via that method then that would be a great place to compare with what I’ve come up with using the custom functions above.

Hi Brebbles, Thank you for your code above - I will try to test it, but I am not sure the logic carries through from logistic to conditional logistic except in the case of linear learner functions (and I worry GBM3 may make similar approximating assumptions as you). In the case of stumps, the strata-centered stumps are not the same as the stumps of the strata-centered variables, which would be a non-issue in the linear case but I think matter quite alot for stumps. Consider this: if a variable is split, there may be some groups for which that variable is entirely on one side or other of the split, in which case such groups’ predictions would be completely unaffected by the split. This shows that the effect of a particular split (or candidate split) will be entirely different from group (stratum) to group. This behaviour is completely different from the logistic regression context, where the predictions would be nudged one of only two possible ways in a split.

Hi Brebbles. I have gone through your code and I agree that it is mathematically correct [in fact, after a break of some weeks, I wrote up something myself which was quite close to this, just to understand]. So far I have only tested using the averaged loss (rather than the sum), and while it seems to work (and I think performs similarly to gbm3), the results have not been as good as the analogous for the simple logistic case, and I think I understand why: I think renormalisation needs to take place during the splitting steps, or else the splits won’t necessarily make much sense; currently, they won’t give wrong answers, but just won’t improve fit much. Do you happen to know how (in a manner similar to above) customise the predictions which are used during the splitting? If one could do this, then I think i know how to make this really work, and if my theory proves right, that might even be worth a research note.

Regarding my comment above, I tried a few different examples using code like the above, and the results are better than for my previous example… but I do think that my suggestion could greatly improve the effectiveness.

Hi AI (below). I didn’t understand… did you say you had gotten left-truncation working and the results weren’t good?

FYI, here’s a new pull request to enable cross validation with group information: https://github.com/dmlc/xgboost/pull/4474

Just in case you missed that post (click on the link and look for samkaufman post):

I have tried it and indeed got worth answers than with a CL model. Interesting comments from compleathorseplayer. Note, the start and stop are the strata id, where start can be say strata-0.5 and stop equals to strata. The point is to keep those non overlapping across strata.

Hi Brebbles, I achieved this through just defining my group (stratum) as str1
attr(dtrain, ‘strat’)<-str1
and then accessing it inside the objective function and evaluation error using attr(dtrain,‘strat’).
This covers both training and test samples.

Hope this helps.

Hi compleathorseplayer - apologies for not seeing your replies until now! I see what you are saying with your example above, but I am not sure that it actually differs from a linear conditional - if you consider the linear case where all of the cases in a particular strata have the same value, then any linear model parameter applied to this particular strata will have no affect on the model performance or output at all.

I appreciate that all values in a strata lying on one side of a variable split is different to all values being the exactly same in a strata - but if this was the case then would it not be a point that the boosted model couldn’t see fit to make a split somewhere in the interior of the values in the strata?

This is an interesting point. I can see why re-normalising would get you different results to not re-normalising, but are you able to expand on why they would be better - ie when you say the won’t make much sense?

In any case it is probably possible to do this, however you would probably need to set up your own function to wrap around your xgBoost call. Firstly you would set up xgBoost to only run for 1 round, then take the output, apply it to your data using the “base_margin” and re-normalise. This function would have to be repeated until you had a desirable model using cross-validation or whatever other method you use. It’s worth noting however that I cannot seem to get the “base_margin” feature working when splitting the dataset for cross-validation - see here

Sorry compleathorseplayer, I slept on it and realised I may have misinterpreted what you meant.

With re-normalisation - do you mean re-normalising the margin so that you end up summing to 100% for each strata, or are you talking about re-normalising your features after each step, as in scaling your features before each boost?

Thanks for your comments. After thinking about it and playing with it a bit more, I think that the way I described is probably still logically consistent, but I don’t think the intuition for the conditional logit case is as compelling as for the binary case. Since you mentioned cross-validation, I thought I would mention that I am working on a routine which would perform LOOCV on fitted XGBoost models (leaf nodes only, conditional on the structure and splits). Would you happen to know if anything like that exists already?