XGBoost spark get leaf issue


#1

Hi, I’m using XGBoost-4j spark 0.80, and use setLeafPredictionCol() to get leaf index.
The metadata of the model I trained is
{“class”:“ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel”,“timestamp”:1557508793544,“sparkVersion”:“2.3.0”,“uid”:“xgbc_5b9d95212ebf”,“paramMap”:{“baseScore”:0.5,“customObj”:null,“evalMetric”:“auc”,“numRound”:200,“nthread”:1,“trackerConf”:{“workerConnectionTimeout”:0,“trackerImpl”:“python”},“eta”:0.2,“treeMethod”:“auto”,“useExternalMemory”:false,“timeoutRequestWorkers”:1800000,“silent”:1,“colsampleBytree”:0.9,“lambda”:1.0,“subsample”:0.8,“seed”:0,“alpha”:0.0,“skipDrop”:0.0,“sketchEps”:0.03,“checkpointPath”:"",“probabilityCol”:“probability”,“numWorkers”:64,“numEarlyStoppingRounds”:0,“missing”:“NaN”,“maxBin”:16,“growPolicy”:“depthwise”,“sampleType”:“uniform”,“maxDeltaStep”:0.0,“minChildWeight”:1.0,“colsampleBylevel”:1.0,“rawPredictionCol”:“rawPrediction”,“gamma”:0.0,“treeLimit”:0,“customEval”:null,“checkpointInterval”:-1,“featuresCol”:“features”,“normalizeType”:“tree”,“maxDepth”:7,“rateDrop”:0.0,“labelCol”:“label”,“lambdaBias”:0.0,“predictionCol”:“prediction”,“objective”:“binary:logistic”,“trainTestRatio”:1.0,“scalePosWeight”:5.0}}

As you can see, the tree num is 200 and the depth is 7

In my opinion, the leaf index should be in the range between 127-254, but i get the leaf index of some samples less than 127 for some trees.
[205.0, 141.0, 150.0, 138.0, 154.0, 205.0, 182.0, 238.0, 150.0, 172.0, 196.0, 226.0, 158.0, 136.0, 174.0, 160.0, 145.0, 248.0, 155.0, 180.0, 158.0, 204.0, 171.0, 221.0, 146.0, 187.0, 215.0, 163.0, 168.0, 153.0, 212.0, 169.0, 242.0, 249.0, 233.0, 228.0, 172.0, 133.0, 254.0, 251.0, 191.0, 167.0, 163.0, 186.0, 246.0, 146.0, 188.0, 249.0, 229.0, 166.0, 222.0, 235.0, 152.0, 184.0, 184.0, 226.0, 251.0, 233.0, 127.0, 218.0, 137.0, 128.0, 187.0, 235.0, 179.0, 185.0, 166.0, 208.0, 232.0, 154.0, 228.0, 249.0, 182.0, 213.0, 152.0, 158.0, 155.0, 221.0, 159.0, 202.0, 169.0, 188.0, 184.0, 244.0, 159.0, 182.0, 245.0, 157.0, 213.0, 142.0, 188.0, 132.0, 240.0, 252.0, 192.0, 172.0, 151.0, 156.0, 132.0, 156.0, 191.0, 173.0, 231.0, 229.0, 197.0, 224.0, 195.0, 199.0, 205.0, 223.0, 243.0, 232.0, 166.0, 188.0, 254.0, 222.0, 139.0, 253.0, 211.0, 204.0, 243.0, 189.0, 174.0, 178.0, 150.0, 184.0, 162.0, 194.0, 171.0, 225.0, 203.0, 222.0, 238.0, 159.0, 205.0, 177.0, 235.0, 246.0, 206.0, 168.0, 249.0, 214.0, 237.0, 239.0, 239.0, 228.0, 197.0, 130.0, 192.0, 206.0, 191.0, 239.0, 137.0, 137.0, 191.0, 125.0, 250.0, 111.0, 239.0, 211.0, 229.0, 240.0, 250.0, 141.0, 206.0, 104.0, 237.0, 198.0, 152.0, 251.0, 212.0, 217.0, 155.0, 225.0, 202.0, 230.0, 210.0, 178.0, 209.0, 191.0, 231.0, 236.0, 157.0, 235.0, 191.0, 182.0, 173.0, 176.0, 193.0, 169.0, 146.0, 189.0, 215.0, 228.0, 221.0, 241.0, 196.0, 221.0, 248.0, 223.0]

What’s wrong with my model?


#2

Why do you think this is the case?


#3

hi,thanks for the reply. Do you mean that for some trees, it’s reasonable that the depth can be less than the “maxDepth” setting?


#4

Yes, depth can be less than maxDepth. Try dumping your model