How does XGBoost handles query weight in LTR?

bellati · December 11, 2023, 12:36pm

Hello, I am interested in learning how query weight is handled by XGBoost in Learning-to-Rank. More specifically, if there is some kind of normalization happening for queries that contain too many pairs. As far as I understood in the code, no normalization on this regard is going on but would like to double-check that.

Thank you in advance.

jiamingy · December 13, 2023, 5:02am

Apologies for the late reply, there’s normalization based on weighted lambda gradient.

bellati · December 13, 2023, 8:00am

All good, thank you for you reply.

I am using v1.7.5

If it is not too much to ask, could you point me where this is happening ?

I see that there is a weight per group and some constant weight normalization factor that still should preserve the relative distance between groups. link
I also see some normalization based on the num_pairsample param, which, by default, assumes the value of 1.
And yet another one based on fix_list_weight, which by default is 0 and uses the number of documents in the query.

link for both points 2,3

What you refer to, is it any of the options above ? I am new to the code base so I possibly could have missed something.

jiamingy · December 13, 2023, 4:44pm

I revamped the LTR implementation in 2.0, the following links are for the new implementation, I can’t recall the details for the old one, which I think had much less features.

bellati · December 14, 2023, 8:39am

Thanks! I will read it through! I am still stuck with 1.7.5 because I am training using the jvm-packages and we are not able to use newer versions because of a problem with maven.