How to set xgb-params in spark

layor · November 7, 2018, 12:06pm

I try to use xgboost(0.72) in spark(2.1) to train a regression model。
here is my code
val paramsMap = List("eta" -> 0.1, "subsample" -> 0.8, "colsample_bytree" -> 0.8, "max_depth" -> 8, "objective" -> "reg:linear", "eval_metric" -> "mae", "lambda" -> 5, "silent" -> 1).toMap
val reg = XGBoost.trainWithRDD(train_rdd, paramsMap, 128, 4)
reg.save(OUTPUT)
then i check the model file. the metadata is below
{"class":"ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel","timestamp":1541583942736,"sparkVersion":"2.1.1","uid":"XGBoostRegressionModel_2b546d5b3cc1","paramMap":{"subsample":1.0,"max_bin":16,"gamma":0.0,"tree_method":"auto","featuresCol":"features","min_child_weight":1.0,"predictionCol":"prediction","grow_policy":"depthwise","lambda":1.0,"sketch_eps":0.03,"scale_pos_weight":1.0,"labelCol":"label","colsample_bylevel":1.0,"normalize_type":"tree","colsample_bytree":1.0,"skip_drop":0.0,"booster":"gbtree","use_external_memory":false,"eta":0.3,"max_delta_step":0.0,"alpha":0.0,"sample_type":"uniform","max_depth":6,"rate_drop":0.0,"lambda_bias":0.0}}
as you can see, The parameter is still the default
So How to set regression model parameter correctly?
THX

hcho3 · November 7, 2018, 11:23pm

@CodingCat Any idea about this? Do we support save() method?

CodingCat · November 9, 2018, 5:33am

can you try 0.8x

0.72 might has bug there

layor · November 9, 2018, 6:21am

Our cluster’s spark version is 2.1， it seems like not fit 0.8。
And i want to know, Though the parameter in metadata is default, the parameter in model is what i set ?
Thanks