Ranking NDCG\Pairwise position sensitivity

asimfiverr · February 27, 2019, 4:07pm

In case of XGBoost ranking loss (pairwise or NDCG), which pairs are compared when multiple relevancy labels are used. Example: label values: no event = 0, click = 1, buy =2. results set: 0,0,2,0,1,0. The results set is scanned by the user from left to right. Does the 5th and 6th results are compared to the 3rd one? I saw that there is some sampling when computing the objective function. Which pairs are sampled? Thank u!

hcho3 · February 27, 2019, 6:48pm

The ranking objective function performs stratified sampling per relevance label (0, 1, 2, etc).

asimfiverr · February 28, 2019, 9:34am

10x @hcho3 Do you mean that when calculating the gradient for each item in the list, the objective function performs stratified sampling of the pairs for this item per relevance label? is there a parameter where I can control the ratio of sampling? is there a way to control the sampling parameter? I see that there is a variable num_pairsample in the code, but not sure if there is a parameter in the interface to control it.

hcho3 · February 28, 2019, 6:42pm

Yes.

You should set the parameter num_pairsample.

jimmy-walker · March 17, 2021, 10:09am

Sorry, I found num_pairsample is not supported in xgboost-spark 0.9.0.

val paramMap = Map("eta" -> eta, "max_depth" -> max_depth, 
      "objective" -> "rank:pairwise", "num_round" -> num_round,
      "group_col" -> "group", "tracker_conf" -> TrackerConf(0L, "scala"),
      "eval_metric" -> "ndcg", "min_child_weight" -> min_child_weight, 
      "eval_sets" -> Map("eval1" -> df_eval), "num_pairsample" -> 2)
XGBoostRegressor(paramMap).fit(df_train)