I trained a sklearn.XGBRegressor with:
XGBRegressor(n_estimators=500, max_depth=50, n_jobs=-1, tree_method=‘hist’,
random_state=2,learning_rate=0.1, min_child_weight=1, seed=0,
subsample= 0.8, colsample_bytree=0.8, gamma=0, reg_alpha=0,
reg_lambda=1,verbosity=0)
I also train DaskXGBRegressor with LocalCluster(n_workers=1, threads_per_worker=1) as:
xgboost.dask.DaskXGBRegressor(n_estimators=500, max_depth=50, n_jobs=-1, tree_method=‘hist’,
random_state=2,learning_rate=0.1, min_child_weight=1, seed=0,
subsample= 0.8, colsample_bytree=0.8, gamma=0, reg_alpha=0,
reg_lambda=1,verbosity=0,objective=‘reg:squarederror’)
Data value feed to both models are the same except one is pandas dataframe while the other one is dask dataframe. And dask dataframe is converted from the pandas dataframe as:
x = dask.dataframe.from_pandas(train_x, npartitions=1)
However I find when using trained DaskXGBRegressor to do prediction, the accuracy is much lower than trained XGBRegressor.
Could this be a bug or DaskXGBRegressor was incorrectly used?