[jvm-packages] Getting Group For Custom Rank Objective and Eval

I’m in the process of researching moving off of a fork of XGboost to try and move onto vanilla XGboost. The primary thing that is holding me up is that in this fork a getter has been added to Scala DMatrix so that the custom objectives function and eval functions can use group info in their calculations. In the C-code for ranking, the group is accessed by a pointer, but there doesn’t seem to be a way to get this info from the Scala DMatrix.

Is there a canonical way to get group info from a DMatrix in the Scala wrapper of XGboost that I am missing?

@hcho3 Alternatively, if you have an example to get the Qid from your relatively recent Qid pull request, I think it would suit my purposes.

I’m just looking for a way to perform group level calculations at the Eval & Objective stage.

Thanks for any advice you can give!

@Will It looks like an oversight that neither group_ptr_ nor qids_ are exposed to C API. (Interestingly, it’s possible set group_ptr_ via XGDMatrixSetGroup.) Since both Scala and Java wrappers communicate with the core lib via C API, they currently do not have access to these fields. It should be a simple affair to add a function to expose group_ptr_.

I’ll file a pull request to add XGDMatrixGetGroup to C API. Once that’s merged, we can expose it to the Scala wrapper.

Thanks so much! Let me know if there is anything I can do to help, as I currently have a 2 year old fork of XGboost that provides this functionality (along with the ability to have multiple custom eval functions) and I would love to get to a state that we can move off of our fork!

@hcho3 If you agree this is an oversight, I am going to actually get started on a PR that adds this functionality. Your agreement has been vastly helpful!

Yes, this is one of wanted features (e.g. this). Is this fork something you’d be able to share with us?

@hcho3 Let me work on making sure everything is in the right shape such that I can share. Just to note, the fork is 2 years old. I think our changes shouldn’t have any non-trivial conflicts, but not positive since the code base has progressed since the time of our fork.

@Will No problem. Take your time. It would be great if we can get multiple watchlist eval into the vanilla XGBoost.

By the way, I opened a pull request to add XGDMatrixGetGroup(): https://github.com/dmlc/xgboost/pull/3514

Hey @hcho3 - I have been working to port our changes to the Java and Scala XGBoost packages that allow multiple eval functions to the modern version of XGBoost.

I think I am almost in a good place, but can’t seem to find docs on locally making sure my Java code is green. Could you point me in the right direction on this front?

I’m also happy to take over that PR for GetGroup (at least to add the functionality to JVM-packages, now that you have exposed the C API). Would you be fine with me taking that off your hands?

1 Like

You can run Java tests for starters. Under jvm-packages/ directory, run

mvn -q clean install -DskipTests -Dmaven.test.skip
mvn -q test

Yes, that would be awesome.