In case of XGBoost ranking loss (pairwise or NDCG), which pairs are compared when multiple relevancy labels are used. Example: label values: no event = 0, click = 1, buy =2. results set: 0,0,2,0,1,0. The results set is scanned by the user from left to right. Does the 5th and 6th results are compared to the 3rd one? I saw that there is some sampling when computing the objective function. Which pairs are sampled? Thank u!
The ranking objective function performs stratified sampling per relevance label (0, 1, 2, etc).
10x @hcho3 Do you mean that when calculating the gradient for each item in the list, the objective function performs stratified sampling of the pairs for this item per relevance label? is there a parameter where I can control the ratio of sampling? is there a way to control the sampling parameter? I see that there is a variable num_pairsample in the code, but not sure if there is a parameter in the interface to control it.
You should set the parameter
Sorry, I found
num_pairsample is not supported in xgboost-spark 0.9.0.
val paramMap = Map("eta" -> eta, "max_depth" -> max_depth, "objective" -> "rank:pairwise", "num_round" -> num_round, "group_col" -> "group", "tracker_conf" -> TrackerConf(0L, "scala"), "eval_metric" -> "ndcg", "min_child_weight" -> min_child_weight, "eval_sets" -> Map("eval1" -> df_eval), "num_pairsample" -> 2) XGBoostRegressor(paramMap).fit(df_train)