Reposting from https://github.com/dmlc/xgboost/issues/3435.
OS: RHEL 6
Scala 2.11.8
Spark 2.2.0
Package used (python/R/jvm/C++): jvm
xgboost
version used: v0.72
Steps to reproduce
Train a model using XGBoost.trainWithRDD, then output model parameters and save the model. The parameters returned from calling extractParamMap() and those found in the metadata of the saved model are always the default parameters, instead of the user-input parameters.
Code
val paramMap = List(
"booster" -> "gbtree",
"silent" -> 0,
"nthread" -> 4,
"objective" -> "reg:linear",
"eta" -> 0.05f,
"max_depth" -> 5,
"min_child_weight" -> 2,
"subsample" -> 0.5f,
"alpha" -> 0,
"lambda" -> 0,
"seed" -> 27,
"eval_metric" -> "auc",
"tree_method" -> "approx").toMap
val xgboostModelRDD = XGBoost.trainWithRDD(trainRDD, paramMap, round=5, nWorkers=1, useExternalMemory=true)
print(xgboostModelRDD.extractParamMap())
xgboostModelRDD.write.overwrite().save("xgboostModel")
Output
{
XGBoostRegressionModel_151ece6c7f93-alpha: 0.0,
XGBoostRegressionModel_151ece6c7f93-booster: gbtree,
XGBoostRegressionModel_151ece6c7f93-colsample_bylevel: 1.0,
XGBoostRegressionModel_151ece6c7f93-colsample_bytree: 1.0,
XGBoostRegressionModel_151ece6c7f93-eta: 0.3,
XGBoostRegressionModel_151ece6c7f93-featuresCol: features,
XGBoostRegressionModel_151ece6c7f93-gamma: 0.0,
XGBoostRegressionModel_151ece6c7f93-grow_policy: depthwise,
XGBoostRegressionModel_151ece6c7f93-labelCol: label,
XGBoostRegressionModel_151ece6c7f93-lambda: 1.0,
XGBoostRegressionModel_151ece6c7f93-lambda_bias: 0.0,
XGBoostRegressionModel_151ece6c7f93-max_bin: 16,
XGBoostRegressionModel_151ece6c7f93-max_delta_step: 0.0,
XGBoostRegressionModel_151ece6c7f93-max_depth: 6,
XGBoostRegressionModel_151ece6c7f93-min_child_weight: 1.0,
XGBoostRegressionModel_151ece6c7f93-normalize_type: tree,
XGBoostRegressionModel_151ece6c7f93-predictionCol: prediction,
XGBoostRegressionModel_151ece6c7f93-rate_drop: 0.0,
XGBoostRegressionModel_151ece6c7f93-sample_type: uniform,
XGBoostRegressionModel_151ece6c7f93-scale_pos_weight: 1.0,
XGBoostRegressionModel_151ece6c7f93-sketch_eps: 0.03,
XGBoostRegressionModel_151ece6c7f93-skip_drop: 0.0,
XGBoostRegressionModel_151ece6c7f93-subsample: 1.0,
XGBoostRegressionModel_151ece6c7f93-tree_method: auto,
XGBoostRegressionModel_151ece6c7f93-use_external_memory: false
}
What have you tried?
Training the model using XGBoost.trainWithDataFrame saves the user-input parameters (calling extractParamMap() and saving the model both output the user-input parameters), but XGBoost.trainWithRDD seems to be saving the default parameters.